当前位置: 首页 > news >正文

网站开发 招聘自助网站

网站开发 招聘,自助网站,做网站干什么用,全国物流网站输入数据 模型使用absmax 量化方法进行b比特量化,将输入量化到 [ − Q b , Q b ] ( Q b 2 b − 1 ) \left[-Q_{b},Q_{b}\right](Q_{b}2^{b-1}) [−Qb​,Qb​](Qb​2b−1) x ~ Q u a n t ( x ) C l i p ( x Q b γ , − Q b ϵ , Q b − ϵ ) , Clip ⁡ ( x , a , b ) ma…输入数据 模型使用absmax 量化方法进行b比特量化,将输入量化到 [ − Q b , Q b ] ( Q b 2 b − 1 ) \left[-Q_{b},Q_{b}\right](Q_{b}2^{b-1}) [−Qb​,Qb​](Qb​2b−1) x ~ Q u a n t ( x ) C l i p ( x × Q b γ , − Q b ϵ , Q b − ϵ ) , Clip ⁡ ( x , a , b ) max ⁡ ( a , min ⁡ ( b , x ) ) , γ ∣ ∣ x ∣ ∣ ∞ , \widetilde{x}\mathrm{Quant}(x)\mathrm{Clip}(x\times\frac{Q_b}{\gamma},-Q_b\epsilon,Q_b-\epsilon),\\ \operatorname{Clip}(x,a,b)\max(a,\min(b,x)),\quad\gamma||x||_\infty, x Quant(x)Clip(x×γQb​​,−Qb​ϵ,Qb​−ϵ),Clip(x,a,b)max(a,min(b,x)),γ∣∣x∣∣∞​, 其中 ε 是一个小的浮点数可防止在执行截断时溢出。 // https://github.com/kyegomez/BitNet/blob/main/bitnet/bitbnet_b158.py def absmean_quantize_weights(weights):Quantizes the weights to -1, 0, or 1 using an absmean quantization function.Parameters:- weights (Tensor): The weights of a neural network layer.Returns:- Tensor: The quantized weights.# Calculate the average absolute value (γ) of the weightsgamma torch.mean(torch.abs(weights))# Scale weights by γ and round to the nearest integer among {-1, 0, 1}quantized_weights torch.clamp(torch.round(weights / gamma), min-1, max1)return quantized_weights权重 权重 W 的二值化可以公式化为 α 1 n m ∑ i j W i j W ~ S i g n ( W − α ) , Sign ⁡ ( W i j ) { 1 , if W i j 0 , − 1 , if W i j ≤ 0 , \\ \alpha\frac1{nm}\sum_{ij}W_{ij} \\ \widetilde{W}\mathrm{Sign}(W-\alpha),\\ \left.\operatorname{Sign}(W_{ij})\left\{\begin{array}{ll}1,\quad\text{if}W_{ij}0,\\-1,\quad\text{if}W_{ij}\leq0,\end{array}\right.\right. αnm1​ij∑​Wij​W Sign(W−α),Sign(Wij​){1,−1,​ifWij​0,ifWij​≤0,​ 矩阵乘法 使用上述量化方程矩阵乘法可以写成 y W ~ x ~ y\widetilde W\widetilde{x} yW x 为了保持量化后的方差我们在激活量化之前引入了一个 LayerNorm函数。这样输出 y 的方差就估计为 1 y W ~ x ~ W ~ Quant ( LN ( x ) ) × β γ Q b y\widetilde{W}\widetilde{x}\widetilde{W}\text{Quant}(\text{LN}(x))\times\frac{\beta\gamma}{Q_b} yW x W Quant(LN(x))×Qb​βγ​ L N ( x ) x − E ( x ) V a r ( x ) ϵ , β 1 n m ∥ W ∥ 1 \mathrm{LN}(x)\frac{x-E(x)}{\sqrt{\mathrm{Var}(x)\epsilon}},\quad\beta\frac1{nm}\|W\|_1 LN(x)Var(x)ϵ ​x−E(x)​,βnm1​∥W∥1​ // https://github.com/kyegomez/BitNet/blob/main/bitnet/bitlinear.py import torch from torch import Tensor, nnclass BitLinear(nn.Linear):BitLinear is a custom linear layer that performs binarization of weights and quantization of activationsin a group-wise manner.Args:in_features (int): Number of input features.out_features (int): Number of output features.bias (bool, optional): If set to False, the layer will not learn an additive bias. Default is True.num_groups (int, optional): Number of groups to divide the weights and activations into. Default is 1.def __init__(self,in_features: int,out_features: int,bias: bool True,num_groups: int 1,b: int 8,):super().__init__(in_features, out_features, bias)self.in_features in_featuresself.out_features out_featuresself.b bself.num_groups num_groupsself.eps 1e-5self.norm nn.LayerNorm(in_features)def ste(self, x):Applies the sign function for binarization and uses Straight-Through Estimator (STE) during backward pass.Args:x (Tensor): Input tensor.Returns:Tensor: Binarized tensor.binarized_x torch.sign(x)binarized_x (binarized_x - x).detach() xreturn binarized_xdef binarize_weights_groupwise(self):Binarizes the weights of the layer in a group-wise manner using STE.Returns:Tensor: Binarized weights tensor.group_size self.weight.shape[0] // self.num_groupsbinarized_weights torch.zeros_like(self.weight)for g in range(self.num_groups):start_idx g * group_sizeend_idx (g 1) * group_sizeweight_group self.weight[start_idx:end_idx]alpha_g weight_group.mean()binarized_weights[start_idx:end_idx] self.ste(weight_group - alpha_g)return binarized_weightsdef quantize_activations_groupwise(self, x):Quantizes the activations of the layer in a group-wise manner.Args:x (Tensor): Input tensor.b (int, optional): Number of bits for quantization. Default is 8.Returns:Tensor: Quantized activations tensor.Q_b 2 ** (self.b - 1)group_size x.shape[0] // self.num_groupsquantized_x torch.zeros_like(x)for g in range(self.num_groups):start_idx g * group_sizeend_idx (g 1) * group_sizeactivation_group x[start_idx:end_idx]gamma_g activation_group.abs().max()quantized_x[start_idx:end_idx] torch.clamp(activation_group * Q_b / (gamma_g self.eps),-Q_b self.eps,Q_b - self.eps,)return quantized_xdef dequantize_activations_groupwise(self, x):Dequantizes the activations of the layer in a group-wise manner.Args:x (Tensor): Quantized input tensor.b (int, optional): Number of bits used during the quantization. Default is 8.Returns:Tensor: Dequantized activations tensor.Q_b 2 ** (self.b - 1)dequantized_x torch.zeros_like(x)for g in range(self.num_groups):start_idx g * x.shape[0] // self.num_groupsend_idx (g 1) * x.shape[0] // self.num_groupsquantized_group x[start_idx:end_idx]gamma_g quantized_group.abs().max()dequantized_x[start_idx:end_idx] quantized_group * gamma_g / Q_breturn dequantized_xdef forward(self, x: Tensor) - Tensor:Forward pass of the BitLinear layer.Args:x (Tensor): Input tensor.Returns:Tensor: Output tensor.# Normalize inputx self.norm(x)# Binarize weights and quantize activationsbinarized_weights self.binarize_weights_groupwise()# Perform linear transformationoutput torch.nn.functional.linear(x, binarized_weights, self.bias)# Quantize activationsoutput self.quantize_activations_groupwise(output)# Dequantize activationsoutput self.dequantize_activations_groupwise(output)# Return outputreturn output# Example usage bitlinear BitLinear(10, 5, num_groups2, b8) input_tensor torch.randn(5, 10) # Example input tensor output bitlinear(input_tensor) print(output) # Example output tensorCG 【自然语言处理】【大模型】BitNet用1-bit Transformer训练LLM BitNet: Scaling 1-bit Transformers for Large Language Models The Era of 1-bit LLMs: All Large Language Models are in 1.58 Bits Implementation of “BitNet: Scaling 1-bit Transformers for Large Language Models” in pytorch DB-LLM: Accurate Dual-Binarization for Efficient LLMs 如何看待微软提出的BitNet b1.58
http://www.dnsts.com.cn/news/29126.html

相关文章:

  • 温州网站设计公司哪个网站可以做行测题目
  • 建设一个小网站赚钱吗移动应用开发与服务
  • 做响应式网站制作网页制作古诗素材
  • 网站建设的项目方案做网站电话
  • 什么网站可以做兼职美工怎么写代码做网站
  • 稳健 安全的网站设计制作计算机类17个专业
  • 移动网站的建设建设模板类网站
  • 网上书城网站开发的结论与不足个人博客
  • 公司做网站需要什么内容网页制作免费下载
  • 电子商城网站开发合同纸做的花朵成品网站
  • 潍坊制作网站软件wordpress排序插件
  • 做自己的外贸网站怎样赚钱wordpress android开源
  • 福建建设执业资格网站报名系统怎样把自己的网站做推广
  • 汉沽做网站wordpress tag_id
  • 建设银行网站怎么先无贷款呢电脑行业网站模板
  • 聊城做网站费用价位创建网站的免费软件国内
  • 怎么做天猫内部券网站手机网站建设wap
  • 如何做360网站的排名做一个直播app软件要多少钱
  • 精品课网站建设合同app软件开发学什么专业
  • 网站开发流程图网站开发学院
  • 点击运行显示网站正在建设自己做商务网站有什么利弊
  • 在酒店做那个网站好wordpress保存图片不显示
  • 北京网站建设好不好天智慧团建手机登录端口
  • 买了个域名 如何建网站数据型网站
  • 济南网站seo顾问北京空间信息传输中心
  • 铜陵公司做网站盐城网站建设推广
  • 外国高端网站设计检察 网站建设
  • flash手机网站制作怎样做搜索引擎推广
  • 潍坊建设网站的公司电话学校门户网站建设的优势
  • 建筑公司网站首页图片门户网站优化方案