网站开发 招聘,自助网站,做网站干什么用,全国物流网站输入数据 模型使用absmax 量化方法进行b比特量化,将输入量化到 [ − Q b , Q b ] ( Q b  2 b − 1 ) \left[-Q_{b},Q_{b}\right](Q_{b}2^{b-1}) [−Qb,Qb](Qb2b−1)  x ~  Q u a n t ( x )  C l i p ( x  Q b γ , − Q b  ϵ , Q b − ϵ ) , Clip  ( x , a , b )  ma…输入数据 模型使用absmax 量化方法进行b比特量化,将输入量化到 [ − Q b , Q b ] ( Q b  2 b − 1 ) \left[-Q_{b},Q_{b}\right](Q_{b}2^{b-1}) [−Qb,Qb](Qb2b−1)  x ~  Q u a n t ( x )  C l i p ( x × Q b γ , − Q b  ϵ , Q b − ϵ ) , Clip  ( x , a , b )  max  ( a , min  ( b , x ) ) , γ  ∣ ∣ x ∣ ∣ ∞ , \widetilde{x}\mathrm{Quant}(x)\mathrm{Clip}(x\times\frac{Q_b}{\gamma},-Q_b\epsilon,Q_b-\epsilon),\\ \operatorname{Clip}(x,a,b)\max(a,\min(b,x)),\quad\gamma||x||_\infty, x   Quant(x)Clip(x×γQb,−Qbϵ,Qb−ϵ),Clip(x,a,b)max(a,min(b,x)),γ∣∣x∣∣∞,  其中 ε 是一个小的浮点数可防止在执行截断时溢出。  
// https://github.com/kyegomez/BitNet/blob/main/bitnet/bitbnet_b158.py
def absmean_quantize_weights(weights):Quantizes the weights to -1, 0, or 1 using an absmean quantization function.Parameters:- weights (Tensor): The weights of a neural network layer.Returns:- Tensor: The quantized weights.# Calculate the average absolute value (γ) of the weightsgamma  torch.mean(torch.abs(weights))# Scale weights by γ and round to the nearest integer among {-1, 0, 1}quantized_weights  torch.clamp(torch.round(weights / gamma), min-1, max1)return quantized_weights权重 
权重 W 的二值化可以公式化为 α  1 n m ∑ i j W i j W ~  S i g n ( W − α ) , Sign  ( W i j )  {  1 , if W i j  0 , − 1 , if W i j ≤ 0 , \\ \alpha\frac1{nm}\sum_{ij}W_{ij} \\ \widetilde{W}\mathrm{Sign}(W-\alpha),\\ \left.\operatorname{Sign}(W_{ij})\left\{\begin{array}{ll}1,\quad\text{if}W_{ij}0,\\-1,\quad\text{if}W_{ij}\leq0,\end{array}\right.\right. αnm1ij∑WijW   Sign(W−α),Sign(Wij){1,−1,ifWij0,ifWij≤0, 矩阵乘法 
使用上述量化方程矩阵乘法可以写成 y  W ~ x ~ y\widetilde W\widetilde{x} yW   x    
为了保持量化后的方差我们在激活量化之前引入了一个 LayerNorm函数。这样输出 y 的方差就估计为 1 y  W ~ x ~  W ~ Quant ( LN ( x ) ) × β γ Q b y\widetilde{W}\widetilde{x}\widetilde{W}\text{Quant}(\text{LN}(x))\times\frac{\beta\gamma}{Q_b} yW   x   W   Quant(LN(x))×Qbβγ  L N ( x )  x − E ( x ) V a r ( x )  ϵ , β  1 n m ∥ W ∥ 1 \mathrm{LN}(x)\frac{x-E(x)}{\sqrt{\mathrm{Var}(x)\epsilon}},\quad\beta\frac1{nm}\|W\|_1 LN(x)Var(x)ϵ   x−E(x),βnm1∥W∥1 // https://github.com/kyegomez/BitNet/blob/main/bitnet/bitlinear.py
import torch
from torch import Tensor, nnclass BitLinear(nn.Linear):BitLinear is a custom linear layer that performs binarization of weights and quantization of activationsin a group-wise manner.Args:in_features (int): Number of input features.out_features (int): Number of output features.bias (bool, optional): If set to False, the layer will not learn an additive bias. Default is True.num_groups (int, optional): Number of groups to divide the weights and activations into. Default is 1.def __init__(self,in_features: int,out_features: int,bias: bool  True,num_groups: int  1,b: int  8,):super().__init__(in_features, out_features, bias)self.in_features  in_featuresself.out_features  out_featuresself.b  bself.num_groups  num_groupsself.eps  1e-5self.norm  nn.LayerNorm(in_features)def ste(self, x):Applies the sign function for binarization and uses Straight-Through Estimator (STE) during backward pass.Args:x (Tensor): Input tensor.Returns:Tensor: Binarized tensor.binarized_x  torch.sign(x)binarized_x  (binarized_x - x).detach()  xreturn binarized_xdef binarize_weights_groupwise(self):Binarizes the weights of the layer in a group-wise manner using STE.Returns:Tensor: Binarized weights tensor.group_size  self.weight.shape[0] // self.num_groupsbinarized_weights  torch.zeros_like(self.weight)for g in range(self.num_groups):start_idx  g * group_sizeend_idx  (g  1) * group_sizeweight_group  self.weight[start_idx:end_idx]alpha_g  weight_group.mean()binarized_weights[start_idx:end_idx]  self.ste(weight_group - alpha_g)return binarized_weightsdef quantize_activations_groupwise(self, x):Quantizes the activations of the layer in a group-wise manner.Args:x (Tensor): Input tensor.b (int, optional): Number of bits for quantization. Default is 8.Returns:Tensor: Quantized activations tensor.Q_b  2 ** (self.b - 1)group_size  x.shape[0] // self.num_groupsquantized_x  torch.zeros_like(x)for g in range(self.num_groups):start_idx  g * group_sizeend_idx  (g  1) * group_sizeactivation_group  x[start_idx:end_idx]gamma_g  activation_group.abs().max()quantized_x[start_idx:end_idx]  torch.clamp(activation_group * Q_b / (gamma_g  self.eps),-Q_b  self.eps,Q_b - self.eps,)return quantized_xdef dequantize_activations_groupwise(self, x):Dequantizes the activations of the layer in a group-wise manner.Args:x (Tensor): Quantized input tensor.b (int, optional): Number of bits used during the quantization. Default is 8.Returns:Tensor: Dequantized activations tensor.Q_b  2 ** (self.b - 1)dequantized_x  torch.zeros_like(x)for g in range(self.num_groups):start_idx  g * x.shape[0] // self.num_groupsend_idx  (g  1) * x.shape[0] // self.num_groupsquantized_group  x[start_idx:end_idx]gamma_g  quantized_group.abs().max()dequantized_x[start_idx:end_idx]  quantized_group * gamma_g / Q_breturn dequantized_xdef forward(self, x: Tensor) - Tensor:Forward pass of the BitLinear layer.Args:x (Tensor): Input tensor.Returns:Tensor: Output tensor.# Normalize inputx  self.norm(x)# Binarize weights and quantize activationsbinarized_weights  self.binarize_weights_groupwise()# Perform linear transformationoutput  torch.nn.functional.linear(x, binarized_weights, self.bias)# Quantize activationsoutput  self.quantize_activations_groupwise(output)# Dequantize activationsoutput  self.dequantize_activations_groupwise(output)# Return outputreturn output# Example usage
bitlinear  BitLinear(10, 5, num_groups2, b8)
input_tensor  torch.randn(5, 10)  # Example input tensor
output  bitlinear(input_tensor)
print(output)  # Example output tensorCG 【自然语言处理】【大模型】BitNet用1-bit Transformer训练LLM  BitNet: Scaling 1-bit Transformers for Large Language Models  The Era of 1-bit LLMs: All Large Language Models are in 1.58 Bits  Implementation of “BitNet: Scaling 1-bit Transformers for Large Language Models” in pytorch  DB-LLM: Accurate Dual-Binarization for Efficient LLMs  如何看待微软提出的BitNet b1.58