当前位置: 首页 > news >正文

手机网站 开发者模式天津建设信息网站

手机网站 开发者模式,天津建设信息网站,制作视频网站教程,深圳网站建设V芯ee8888e最近比较忙#xff0c;有一段时间没更新了#xff0c;最近yolov7用的比较多#xff0c;总结一下。上一篇yolov5及yolov7实战之剪枝_CodingInCV的博客-CSDN博客 我们讲了通过剪枝来裁剪我们的模型#xff0c;达到在精度损失不大的情况下#xff0c;提高模型速度的目的。上一…最近比较忙有一段时间没更新了最近yolov7用的比较多总结一下。上一篇yolov5及yolov7实战之剪枝_CodingInCV的博客-CSDN博客 我们讲了通过剪枝来裁剪我们的模型达到在精度损失不大的情况下提高模型速度的目的。上一篇是从速度的角度这一篇我们从检测性能的角度来改进yolov7yolov5也类似。 对于提高检测器的性能我们除了可以从增加数据、修改模型结构、修改loss等模型本身的角度出发外深度学习领域还有一个方式—蒸馏。简单的说蒸馏就是让性能更强的模型teacher, 参数量更大来指导性能更弱student模型从而提高student模型的性能。 蒸馏的方式有很多种比较简单暴力的比如直接让student模型来拟合teacher模型的输出特征图当然蒸馏也不是万能的毕竟student模型和teacher模型的参数量有差距student模型不一定能很好的学习teacher的知识对于自己的任务有没有作用也需要尝试。 本篇选择的方法是去年CVPR上的针对目标检测的蒸馏算法 yzd-v/FGD: Focal and Global Knowledge Distillation for Detectors (CVPR 2022) (github.com) 针对该方法的解读可以参考FGD-CVPR2022针对目标检测的焦点和全局蒸馏 - 知乎 (zhihu.com) 本篇暂时不涉及理论重点在把这个方法集成到yolov7训练。步骤如下。 载入teacher模型 蒸馏首先需要有一个teacher模型这个teacher模型一般和student同样结构只是参数量更大、层数更多。比如对于yolov5可以尝试用yolov5m来蒸馏yolov5s。 train.py增加一个命令行参数 parser.add_argument(--teacher-weights, typestr, default, helpinitial weights path)在train函数中载入teacher weights过程与原有的载入过程类似注意DP或者DDP模型也要对teacher模型做对应的处理。 # teacher modelif opt.teacher_weights:teacher_weights opt.teacher_weights# with torch_distributed_zero_first(rank):# teacher_weights attempt_download(teacher_weights) # download if not found locallyteacher_model Model(teacher_weights, ch3, ncnc).to(device) # create # load state_dictckpt torch.load(teacher_weights, map_locationdevice) # load checkpointstate_dict ckpt[model].float().state_dict() # to FP32teacher_model.load_state_dict(state_dict, strictTrue) # load#set to evalteacher_model.eval()#set IDetect to train mode# teacher_model.model[-1].train()logger.info(fLoad teacher model from {teacher_weights}) # report# DP modeif cuda and rank -1 and torch.cuda.device_count() 1:model torch.nn.DataParallel(model)if opt.teacher_weights:teacher_model torch.nn.DataParallel(teacher_model)# SyncBatchNormif opt.sync_bn and cuda and rank ! -1:model torch.nn.SyncBatchNorm.convert_sync_batchnorm(model).to(device)logger.info(Using SyncBatchNorm())if opt.teacher_weights:teacher_model torch.nn.SyncBatchNorm.convert_sync_batchnorm(teacher_model).to(device)teacher模型不进行梯度计算因此 if opt.teacher_weights:for param in teacher_model.parameters():param.requires_grad False蒸馏Loss 蒸馏loss是计算teacher模型的一层或者多层与student的对应层的相似度监督student模型向teacher模型靠近。对于yolov7可以去监督三个特征层。 参考FGD的开源代码我们在loss.py中增加一个FeatureLoss类, 参数暂时使用默认 class FeatureLoss(nn.Module):PyTorch version of Feature Distillation for General DetectorsArgs:student_channels(int): Number of channels in the students feature map.teacher_channels(int): Number of channels in the teachers feature map. temp (float, optional): Temperature coefficient. Defaults to 0.5.name (str): the loss name of the layeralpha_fgd (float, optional): Weight of fg_loss. Defaults to 0.001beta_fgd (float, optional): Weight of bg_loss. Defaults to 0.0005gamma_fgd (float, optional): Weight of mask_loss. Defaults to 0.0005lambda_fgd (float, optional): Weight of relation_loss. Defaults to 0.000005def __init__(self,student_channels,teacher_channels,temp0.5,alpha_fgd0.001,beta_fgd0.0005,gamma_fgd0.001,lambda_fgd0.000005,):super(FeatureLoss, self).__init__()self.temp tempself.alpha_fgd alpha_fgdself.beta_fgd beta_fgdself.gamma_fgd gamma_fgdself.lambda_fgd lambda_fgdif student_channels ! teacher_channels:self.align nn.Conv2d(student_channels, teacher_channels, kernel_size1, stride1, padding0)else:self.align Noneself.conv_mask_s nn.Conv2d(teacher_channels, 1, kernel_size1)self.conv_mask_t nn.Conv2d(teacher_channels, 1, kernel_size1)self.channel_add_conv_s nn.Sequential(nn.Conv2d(teacher_channels, teacher_channels//2, kernel_size1),nn.LayerNorm([teacher_channels//2, 1, 1]),nn.ReLU(inplaceTrue), # yapf: disablenn.Conv2d(teacher_channels//2, teacher_channels, kernel_size1))self.channel_add_conv_t nn.Sequential(nn.Conv2d(teacher_channels, teacher_channels//2, kernel_size1),nn.LayerNorm([teacher_channels//2, 1, 1]),nn.ReLU(inplaceTrue), # yapf: disablenn.Conv2d(teacher_channels//2, teacher_channels, kernel_size1))self.reset_parameters()def forward(self,preds_S,preds_T,gt_bboxes,img_metas):Forward function.Args:preds_S(Tensor): Bs*C*H*W, students feature mappreds_T(Tensor): Bs*C*H*W, teachers feature mapgt_bboxes(tuple): Bs*[nt*4], pixel decimal: (tl_x, tl_y, br_x, br_y)img_metas (list[dict]): Meta information of each image, e.g.,image size, scaling factor, etc.assert preds_S.shape[-2:] preds_T.shape[-2:], the output dim of teacher and student differdevice gt_bboxes.deviceself.to(device)if self.align is not None:preds_S self.align(preds_S)N,C,H,W preds_S.shapeS_attention_t, C_attention_t self.get_attention(preds_T, self.temp)S_attention_s, C_attention_s self.get_attention(preds_S, self.temp)Mask_fg torch.zeros_like(S_attention_t)# Mask_bg torch.ones_like(S_attention_t)wmin,wmax,hmin,hmax [],[],[],[]img_h, img_w img_metasbboxes gt_bboxes[:,2:6]#xywh2xyxybboxes xywh2xyxy(bboxes)new_boxxes torch.ones_like(bboxes)new_boxxes[:, 0] torch.floor(bboxes[:, 0]*W)new_boxxes[:, 2] torch.ceil(bboxes[:, 2]*W)new_boxxes[:, 1] torch.floor(bboxes[:, 1]*H)new_boxxes[:, 3] torch.ceil(bboxes[:, 3]*H)#to intnew_boxxes new_boxxes.int()for i in range(N):new_boxxes_i new_boxxes[torch.where(gt_bboxes[:,0]i)]wmin.append(new_boxxes_i[:, 0])wmax.append(new_boxxes_i[:, 2])hmin.append(new_boxxes_i[:, 1])hmax.append(new_boxxes_i[:, 3])area 1.0/(hmax[i].view(1,-1)1-hmin[i].view(1,-1))/(wmax[i].view(1,-1)1-wmin[i].view(1,-1))for j in range(len(new_boxxes_i)):Mask_fg[i][hmin[i][j]:hmax[i][j]1, wmin[i][j]:wmax[i][j]1] \torch.maximum(Mask_fg[i][hmin[i][j]:hmax[i][j]1, wmin[i][j]:wmax[i][j]1], area[0][j])Mask_bg torch.where(Mask_fg 0, 0., 1.)Mask_bg_sum torch.sum(Mask_bg, dim(1,2))Mask_bg[Mask_bg_sum0] / Mask_bg_sum[Mask_bg_sum0].unsqueeze(1).unsqueeze(2)fg_loss, bg_loss self.get_fea_loss(preds_S, preds_T, Mask_fg, Mask_bg, C_attention_s, C_attention_t, S_attention_s, S_attention_t)mask_loss self.get_mask_loss(C_attention_s, C_attention_t, S_attention_s, S_attention_t)rela_loss self.get_rela_loss(preds_S, preds_T)loss self.alpha_fgd * fg_loss self.beta_fgd * bg_loss \ self.gamma_fgd * mask_loss self.lambda_fgd * rela_lossreturn loss, loss.detach()def get_attention(self, preds, temp): preds: Bs*C*W*H N, C, H, W preds.shapevalue torch.abs(preds)# Bs*W*Hfea_map value.mean(axis1, keepdimTrue)S_attention (H * W * F.softmax((fea_map/temp).view(N,-1), dim1)).view(N, H, W)# Bs*Cchannel_map value.mean(axis2,keepdimFalse).mean(axis2,keepdimFalse)C_attention C * F.softmax(channel_map/temp, dim1)return S_attention, C_attentiondef get_fea_loss(self, preds_S, preds_T, Mask_fg, Mask_bg, C_s, C_t, S_s, S_t):loss_mse nn.MSELoss(reductionsum)Mask_fg Mask_fg.unsqueeze(dim1)Mask_bg Mask_bg.unsqueeze(dim1)C_t C_t.unsqueeze(dim-1)C_t C_t.unsqueeze(dim-1)S_t S_t.unsqueeze(dim1)fea_t torch.mul(preds_T, torch.sqrt(S_t))fea_t torch.mul(fea_t, torch.sqrt(C_t))fg_fea_t torch.mul(fea_t, torch.sqrt(Mask_fg))bg_fea_t torch.mul(fea_t, torch.sqrt(Mask_bg))fea_s torch.mul(preds_S, torch.sqrt(S_t))fea_s torch.mul(fea_s, torch.sqrt(C_t))fg_fea_s torch.mul(fea_s, torch.sqrt(Mask_fg))bg_fea_s torch.mul(fea_s, torch.sqrt(Mask_bg))fg_loss loss_mse(fg_fea_s, fg_fea_t)/len(Mask_fg)bg_loss loss_mse(bg_fea_s, bg_fea_t)/len(Mask_bg)return fg_loss, bg_lossdef get_mask_loss(self, C_s, C_t, S_s, S_t):mask_loss torch.sum(torch.abs((C_s-C_t)))/len(C_s) torch.sum(torch.abs((S_s-S_t)))/len(S_s)return mask_lossdef spatial_pool(self, x, in_type):batch, channel, width, height x.size()input_x x# [N, C, H * W]input_x input_x.view(batch, channel, height * width)# [N, 1, C, H * W]input_x input_x.unsqueeze(1)# [N, 1, H, W]if in_type 0:context_mask self.conv_mask_s(x)else:context_mask self.conv_mask_t(x)# [N, 1, H * W]context_mask context_mask.view(batch, 1, height * width)# [N, 1, H * W]context_mask F.softmax(context_mask, dim2)# [N, 1, H * W, 1]context_mask context_mask.unsqueeze(-1)# [N, 1, C, 1]context torch.matmul(input_x, context_mask)# [N, C, 1, 1]context context.view(batch, channel, 1, 1)return contextdef get_rela_loss(self, preds_S, preds_T):loss_mse nn.MSELoss(reductionsum)context_s self.spatial_pool(preds_S, 0)context_t self.spatial_pool(preds_T, 1)out_s preds_Sout_t preds_Tchannel_add_s self.channel_add_conv_s(context_s)out_s out_s channel_add_schannel_add_t self.channel_add_conv_t(context_t)out_t out_t channel_add_trela_loss loss_mse(out_s, out_t)/len(out_s)return rela_lossdef last_zero_init(self, m):if isinstance(m, nn.Sequential):constant_init(m[-1], val0)else:constant_init(m, val0)def reset_parameters(self):kaiming_init(self.conv_mask_s, modefan_in)kaiming_init(self.conv_mask_t, modefan_in)self.conv_mask_s.inited Trueself.conv_mask_t.inited Trueself.last_zero_init(self.channel_add_conv_s)self.last_zero_init(self.channel_add_conv_t)实例化FeatureLoss 在train.py中实例化我们定义的FeatureLoss由于我们要蒸馏三层所以需要定一个蒸馏损失的数组 if opt.teacher_weights:student_kd_layers hyp[student_kd_layers]teacher_kd_layers hyp[teacher_kd_layers]dump_image torch.zeros((1, 3, imgsz, imgsz), devicedevice)targets torch.Tensor([[0, 0, 0, 0, 0, 0]]).to(device)_, features model(dump_image, extra_features student_kd_layers) # forward_, teacher_features teacher_model(dump_image,extra_featuresteacher_kd_layers)kd_losses []for i in range(len(features)):feature features[i]teacher_feature teacher_features[i]_, student_channels, _ , _ feature.shape_, teacher_channels, _ , _ teacher_feature.shapekd_losses.append(FeatureLoss(student_channels,teacher_channels))其中hyp[‘xxx_kd_layers’]是用于指定我们要蒸馏的层序号。 为了提取出我们需要的层的特征图我们还需要对模型推理的代码进行修改这个放在下一篇这一篇先把主要流程过一遍。 蒸馏训练 与普通loss一样在训练中首先计算蒸馏loss, 然后进行反向传播区别只是计算蒸馏loss时需要使用teacher模型也对数据进行推理。 if opt.teacher_weights:pred, features model(imgs, extra_features student_kd_layers) # forward_, teacher_features teacher_model(imgs, extra_features teacher_kd_layers)if loss_ota not in hyp or hyp[loss_ota] 1 and epoch ota_start:loss, loss_items compute_loss_ota(pred, targets.to(device), imgs)else:loss, loss_items compute_loss(pred, targets.to(device)) # loss scaled by batch_size# kd lossloss_items torch.cat((loss_items[0].unsqueeze(0), loss_items[1].unsqueeze(0), loss_items[2].unsqueeze(0), torch.zeros(1, devicedevice), loss_items[3].unsqueeze(0)))loss_items[-1]*imgs.shape[0]for i in range(len(features)):feature features[i]teacher_feature teacher_features[i]kd_loss, kd_loss_item kd_losses[i](feature, teacher_feature, targets.to(device), [imgsz,imgsz])loss kd_lossloss_items[3] kd_loss_itemloss_items[4] kd_loss_item在这里我们将kd_loss累加到了loss上。计算出总的loss,其他就与普通训练一样了。 结语 这篇文章简述了一下yolov7的蒸馏过程更多细节将在下一篇中讲述。
http://www.dnsts.com.cn/news/164735.html

相关文章:

  • 云购网站建设SEO网站链接模型
  • 住房与建设部网站wordpress给图片加logo
  • 网站开发的前后端是哪些wordpress 字体 插件下载地址
  • 建设厅网站装修合同模板苏州网站建设哪家更好
  • 建站历史查询企业网站策划案例
  • 黄骅市网站建设公司哪里学网站建设与管理
  • 网统管公司的网站托管服务怎么样珠海网站建设 旭洁
  • 资料代做网站wordpress老提示更新
  • php网站开发视频教程下载教育网站开发需求分析
  • 网站的策划和建设一个网络空间做两个网站
  • 不会编程如何做自己的网站园林设计
  • 网站 制作价格做网销做什么网站
  • 全景网站如何做wordpress 加密解密
  • 佛山的网站建设承德工程建设信息网站
  • 地方性门户网站有哪些个人网页完整代码
  • app软件制作网站安装一个宽带多少钱
  • 网站备案 失败vitality wordpress
  • 光谷中心城建设投资有限公司网站东莞网站开发报价
  • 网站页面结构怎么做有利于优化seo技术平台
  • 淘宝网站建设百度百科宣城网站seo诊断
  • 网页编辑与网站编辑揭阳seo快速排名
  • 做箱包哪个网站好昆明经济技术开发区官方门户网站
  • 网站建设如何把代码佛山100强企业名单
  • 中山祥云做的网站怎么样百度百科网络营销岗位职责和任职要求
  • 网站建设技术线路选择建新建设集团有限公司网站
  • 十一冶建设集团有限责任公司网站自己做的网站如何上百度
  • pc网站建设哪个好房山营销型网站建设
  • 东莞网站建设乐云seo在线制作网站内容栏目
  • 封面型网页网站有哪些北京公司网站怎么制作
  • 网站下载不了的视频怎么下载北京注册公司地址可以是住宅吗