厦门网站建设工作,公司logo设计生成器,网站锚点链接怎么做,商丘网站制作案例前言
大家好#xff0c;我是Snu77#xff0c;这里是RT-DETR有效涨点专栏。
本专栏的内容为根据ultralytics版本的RT-DETR进行改进#xff0c;内容持续更新#xff0c;每周更新文章数量3-10篇。
专栏以ResNet18、ResNet50为基础修改版本#xff0c;同时修改内容也支持Re…前言
大家好我是Snu77这里是RT-DETR有效涨点专栏。
本专栏的内容为根据ultralytics版本的RT-DETR进行改进内容持续更新每周更新文章数量3-10篇。
专栏以ResNet18、ResNet50为基础修改版本同时修改内容也支持ResNet32、ResNet101和PPHGNet版本其中ResNet为RT-DETR官方版本11移植过来的参数量基本保持一致(误差很小很小)不同于ultralytics仓库版本的ResNet官方版本同时ultralytics仓库的一些参数是和RT-DETR相冲的所以我也是会教大家调好一些参数和代码真正意义上的跑ultralytics的和RT-DETR官方版本的无区别
欢迎大家订阅本专栏一起学习RT-DETR
一、本文介绍
本文给大家带来的改进机制是特征提取网络EfficientFormerV2其是一种针对移动设备优化的视觉变换器Vision Transformer它通过重新考虑ViTs的设计选择实现了低延迟和高参数效率通过修改改网络我们的参数量降低了约百分之五十GFLOPs也降低了百分之五十其作为一种高效和轻量化的网络无论从精度还是效果上都非常推荐大家来使用。 专栏链接RT-DETR剑指论文专栏持续复现各种顶会内容——论文收割机RT-DETR 目录
一、本文介绍
二、EfficientFormerV2原理
三、EfficientformerV2的核心代码
四、手把手教你添加EfficientformerV2
4.1 修改一
4.2 修改二
4.3 修改三
4.4 修改四
4.5 修改五
4.6 修改六
4.7 修改七
4.8 修改八
4.9 RT-DETR不能打印计算量问题的解决
4.10 可选修改
五、EfficientformerV2的yaml文件
5.1 yaml文件
5.2 运行文件
5.3 成功训练截图
六、全文总结 二、EfficientFormerV2原理
论文地址 论文官方代码
代码地址 代码官方代码
EfficientFormerV2是一种针对移动设备优化的视觉变换器Vision Transformer它通过重新考虑ViTs的设计选择实现了低延迟和高参数效率。
关键改进包括
1. 统一的前馈网络FFN通过在FFN中集成深度卷积来增强局部信息处理能力减少了显式的局部令牌混合器简化了网络架构。
2. 搜索空间细化探索了网络深度和宽度的变化发现更深更窄的网络能够在提高准确性的同时降低参数数量和延迟。
3. MHSA多头自注意力改进通过在值矩阵中加入局部信息和增加不同注意力头之间的交流来提升性能而不增加模型大小和延迟。
4. 高分辨率下的注意力机制研究了在早期阶段应用MHSA的策略通过简化注意力模块的复杂度来提升效率。
5. 双路径注意力下采样结合了静态局部下采样和可学习的局部下采样以及通过残差连接的常规跨步卷积来形成局部-全局方式进一步提高准确性。
6. 联合优化模型大小和速度通过精细化的联合搜索算法来优化模型大小和速度找到适合移动设备部署的最优视觉骨干网络。这张图展示了EfficientFormerV2的网络架构设计它包括不同的阶段每个阶段都有不同的组件和功能。 a. 整体架构从输入层stem开始通过四个阶段逐渐降低分辨率同时提升特征维度。
b. 统一FFN模块这一部分结合了池化操作和深度可分离卷积用于增强局部特征的提取。结合了深度卷积层DW.Conv3x3-BN用以加强局部信息的处理。
c. MHSA模块这是一个多头自注意力机制其中包含局部性引入和Talking Head机制以提高性能。多头自注意力MHSA模块通过引入局部性Locality和不同注意力头之间的对话Talking Head来提升性能。
d/e. 注意力高分辨率解决方案在网络的早期层次使用注意力机制以处理高分辨率的特征图。
f. 双路径注意力下采样结合了传统的卷积和注意力机制它结合了卷积和池化操作实现有效的特征下采样。 三、EfficientformerV2的核心代码
代码的使用方式看章节四。
import os
import torch
import torch.nn as nn
import math
import itertools
from timm.models.layers import DropPath, trunc_normal_, to_2tuple__all__ [efficientformerv2_s0, efficientformerv2_s1, efficientformerv2_s2, efficientformerv2_l]EfficientFormer_width {L: [40, 80, 192, 384], # 26m 83.3% 6attnS2: [32, 64, 144, 288], # 12m 81.6% 4attn dp0.02S1: [32, 48, 120, 224], # 6.1m 79.0S0: [32, 48, 96, 176], # 75.0 75.7
}EfficientFormer_depth {L: [5, 5, 15, 10], # 26m 83.3%S2: [4, 4, 12, 8], # 12mS1: [3, 3, 9, 6], # 79.0S0: [2, 2, 6, 4], # 75.7
}# 26m
expansion_ratios_L {0: [4, 4, 4, 4, 4],1: [4, 4, 4, 4, 4],2: [4, 4, 4, 4, 3, 3, 3, 3, 3, 3, 3, 4, 4, 4, 4],3: [4, 4, 4, 3, 3, 3, 3, 4, 4, 4],
}# 12m
expansion_ratios_S2 {0: [4, 4, 4, 4],1: [4, 4, 4, 4],2: [4, 4, 3, 3, 3, 3, 3, 3, 4, 4, 4, 4],3: [4, 4, 3, 3, 3, 3, 4, 4],
}# 6.1m
expansion_ratios_S1 {0: [4, 4, 4],1: [4, 4, 4],2: [4, 4, 3, 3, 3, 3, 4, 4, 4],3: [4, 4, 3, 3, 4, 4],
}# 3.5m
expansion_ratios_S0 {0: [4, 4],1: [4, 4],2: [4, 3, 3, 3, 4, 4],3: [4, 3, 3, 4],
}class Attention4D(torch.nn.Module):def __init__(self, dim384, key_dim32, num_heads8,attn_ratio4,resolution7,act_layernn.ReLU,strideNone):super().__init__()self.num_heads num_headsself.scale key_dim ** -0.5self.key_dim key_dimself.nh_kd nh_kd key_dim * num_headsif stride is not None:self.resolution math.ceil(resolution / stride)self.stride_conv nn.Sequential(nn.Conv2d(dim, dim, kernel_size3, stridestride, padding1, groupsdim),nn.BatchNorm2d(dim), )self.upsample nn.Upsample(scale_factorstride, modebilinear)else:self.resolution resolutionself.stride_conv Noneself.upsample Noneself.N self.resolution ** 2self.N2 self.Nself.d int(attn_ratio * key_dim)self.dh int(attn_ratio * key_dim) * num_headsself.attn_ratio attn_ratioh self.dh nh_kd * 2self.q nn.Sequential(nn.Conv2d(dim, self.num_heads * self.key_dim, 1),nn.BatchNorm2d(self.num_heads * self.key_dim), )self.k nn.Sequential(nn.Conv2d(dim, self.num_heads * self.key_dim, 1),nn.BatchNorm2d(self.num_heads * self.key_dim), )self.v nn.Sequential(nn.Conv2d(dim, self.num_heads * self.d, 1),nn.BatchNorm2d(self.num_heads * self.d),)self.v_local nn.Sequential(nn.Conv2d(self.num_heads * self.d, self.num_heads * self.d,kernel_size3, stride1, padding1, groupsself.num_heads * self.d),nn.BatchNorm2d(self.num_heads * self.d), )self.talking_head1 nn.Conv2d(self.num_heads, self.num_heads, kernel_size1, stride1, padding0)self.talking_head2 nn.Conv2d(self.num_heads, self.num_heads, kernel_size1, stride1, padding0)self.proj nn.Sequential(act_layer(),nn.Conv2d(self.dh, dim, 1),nn.BatchNorm2d(dim), )points list(itertools.product(range(self.resolution), range(self.resolution)))N len(points)attention_offsets {}idxs []for p1 in points:for p2 in points:offset (abs(p1[0] - p2[0]), abs(p1[1] - p2[1]))if offset not in attention_offsets:attention_offsets[offset] len(attention_offsets)idxs.append(attention_offsets[offset])self.attention_biases torch.nn.Parameter(torch.zeros(num_heads, len(attention_offsets)))self.register_buffer(attention_bias_idxs,torch.LongTensor(idxs).view(N, N))torch.no_grad()def train(self, modeTrue):super().train(mode)if mode and hasattr(self, ab):del self.abelse:self.ab self.attention_biases[:, self.attention_bias_idxs]def forward(self, x): # x (B,N,C)B, C, H, W x.shapeif self.stride_conv is not None:x self.stride_conv(x)q self.q(x).flatten(2).reshape(B, self.num_heads, -1, self.N).permute(0, 1, 3, 2)k self.k(x).flatten(2).reshape(B, self.num_heads, -1, self.N).permute(0, 1, 2, 3)v self.v(x)v_local self.v_local(v)v v.flatten(2).reshape(B, self.num_heads, -1, self.N).permute(0, 1, 3, 2)attn ((q k) * self.scale(self.attention_biases[:, self.attention_bias_idxs]if self.training else self.ab))# attn (q k) * self.scaleattn self.talking_head1(attn)attn attn.softmax(dim-1)attn self.talking_head2(attn)x (attn v)out x.transpose(2, 3).reshape(B, self.dh, self.resolution, self.resolution) v_localif self.upsample is not None:out self.upsample(out)out self.proj(out)return outdef stem(in_chs, out_chs, act_layernn.ReLU):return nn.Sequential(nn.Conv2d(in_chs, out_chs // 2, kernel_size3, stride2, padding1),nn.BatchNorm2d(out_chs // 2),act_layer(),nn.Conv2d(out_chs // 2, out_chs, kernel_size3, stride2, padding1),nn.BatchNorm2d(out_chs),act_layer(),)class LGQuery(torch.nn.Module):def __init__(self, in_dim, out_dim, resolution1, resolution2):super().__init__()self.resolution1 resolution1self.resolution2 resolution2self.pool nn.AvgPool2d(1, 2, 0)self.local nn.Sequential(nn.Conv2d(in_dim, in_dim, kernel_size3, stride2, padding1, groupsin_dim),)self.proj nn.Sequential(nn.Conv2d(in_dim, out_dim, 1),nn.BatchNorm2d(out_dim), )def forward(self, x):local_q self.local(x)pool_q self.pool(x)q local_q pool_qq self.proj(q)return qclass Attention4DDownsample(torch.nn.Module):def __init__(self, dim384, key_dim16, num_heads8,attn_ratio4,resolution7,out_dimNone,act_layerNone,):super().__init__()self.num_heads num_headsself.scale key_dim ** -0.5self.key_dim key_dimself.nh_kd nh_kd key_dim * num_headsself.resolution resolutionself.d int(attn_ratio * key_dim)self.dh int(attn_ratio * key_dim) * num_headsself.attn_ratio attn_ratioh self.dh nh_kd * 2if out_dim is not None:self.out_dim out_dimelse:self.out_dim dimself.resolution2 math.ceil(self.resolution / 2)self.q LGQuery(dim, self.num_heads * self.key_dim, self.resolution, self.resolution2)self.N self.resolution ** 2self.N2 self.resolution2 ** 2self.k nn.Sequential(nn.Conv2d(dim, self.num_heads * self.key_dim, 1),nn.BatchNorm2d(self.num_heads * self.key_dim), )self.v nn.Sequential(nn.Conv2d(dim, self.num_heads * self.d, 1),nn.BatchNorm2d(self.num_heads * self.d),)self.v_local nn.Sequential(nn.Conv2d(self.num_heads * self.d, self.num_heads * self.d,kernel_size3, stride2, padding1, groupsself.num_heads * self.d),nn.BatchNorm2d(self.num_heads * self.d), )self.proj nn.Sequential(act_layer(),nn.Conv2d(self.dh, self.out_dim, 1),nn.BatchNorm2d(self.out_dim), )points list(itertools.product(range(self.resolution), range(self.resolution)))points_ list(itertools.product(range(self.resolution2), range(self.resolution2)))N len(points)N_ len(points_)attention_offsets {}idxs []for p1 in points_:for p2 in points:size 1offset (abs(p1[0] * math.ceil(self.resolution / self.resolution2) - p2[0] (size - 1) / 2),abs(p1[1] * math.ceil(self.resolution / self.resolution2) - p2[1] (size - 1) / 2))if offset not in attention_offsets:attention_offsets[offset] len(attention_offsets)idxs.append(attention_offsets[offset])self.attention_biases torch.nn.Parameter(torch.zeros(num_heads, len(attention_offsets)))self.register_buffer(attention_bias_idxs,torch.LongTensor(idxs).view(N_, N))torch.no_grad()def train(self, modeTrue):super().train(mode)if mode and hasattr(self, ab):del self.abelse:self.ab self.attention_biases[:, self.attention_bias_idxs]def forward(self, x): # x (B,N,C)B, C, H, W x.shapeq self.q(x).flatten(2).reshape(B, self.num_heads, -1, self.N2).permute(0, 1, 3, 2)k self.k(x).flatten(2).reshape(B, self.num_heads, -1, self.N).permute(0, 1, 2, 3)v self.v(x)v_local self.v_local(v)v v.flatten(2).reshape(B, self.num_heads, -1, self.N).permute(0, 1, 3, 2)attn ((q k) * self.scale(self.attention_biases[:, self.attention_bias_idxs]if self.training else self.ab))# attn (q k) * self.scaleattn attn.softmax(dim-1)x (attn v).transpose(2, 3)out x.reshape(B, self.dh, self.resolution2, self.resolution2) v_localout self.proj(out)return outclass Embedding(nn.Module):def __init__(self, patch_size3, stride2, padding1,in_chans3, embed_dim768, norm_layernn.BatchNorm2d,lightFalse, asubFalse, resolutionNone, act_layernn.ReLU, attn_blockAttention4DDownsample):super().__init__()self.light lightself.asub asubif self.light:self.new_proj nn.Sequential(nn.Conv2d(in_chans, in_chans, kernel_size3, stride2, padding1, groupsin_chans),nn.BatchNorm2d(in_chans),nn.Hardswish(),nn.Conv2d(in_chans, embed_dim, kernel_size1, stride1, padding0),nn.BatchNorm2d(embed_dim),)self.skip nn.Sequential(nn.Conv2d(in_chans, embed_dim, kernel_size1, stride2, padding0),nn.BatchNorm2d(embed_dim))elif self.asub:self.attn attn_block(dimin_chans, out_dimembed_dim,resolutionresolution, act_layeract_layer)patch_size to_2tuple(patch_size)stride to_2tuple(stride)padding to_2tuple(padding)self.conv nn.Conv2d(in_chans, embed_dim, kernel_sizepatch_size,stridestride, paddingpadding)self.bn norm_layer(embed_dim) if norm_layer else nn.Identity()else:patch_size to_2tuple(patch_size)stride to_2tuple(stride)padding to_2tuple(padding)self.proj nn.Conv2d(in_chans, embed_dim, kernel_sizepatch_size,stridestride, paddingpadding)self.norm norm_layer(embed_dim) if norm_layer else nn.Identity()def forward(self, x):if self.light:out self.new_proj(x) self.skip(x)elif self.asub:out_conv self.conv(x)out_conv self.bn(out_conv)out self.attn(x) out_convelse:x self.proj(x)out self.norm(x)return outclass Mlp(nn.Module):Implementation of MLP with 1*1 convolutions.Input: tensor with shape [B, C, H, W]def __init__(self, in_features, hidden_featuresNone,out_featuresNone, act_layernn.GELU, drop0., mid_convFalse):super().__init__()out_features out_features or in_featureshidden_features hidden_features or in_featuresself.mid_conv mid_convself.fc1 nn.Conv2d(in_features, hidden_features, 1)self.act act_layer()self.fc2 nn.Conv2d(hidden_features, out_features, 1)self.drop nn.Dropout(drop)self.apply(self._init_weights)if self.mid_conv:self.mid nn.Conv2d(hidden_features, hidden_features, kernel_size3, stride1, padding1,groupshidden_features)self.mid_norm nn.BatchNorm2d(hidden_features)self.norm1 nn.BatchNorm2d(hidden_features)self.norm2 nn.BatchNorm2d(out_features)def _init_weights(self, m):if isinstance(m, nn.Conv2d):trunc_normal_(m.weight, std.02)if m.bias is not None:nn.init.constant_(m.bias, 0)def forward(self, x):x self.fc1(x)x self.norm1(x)x self.act(x)if self.mid_conv:x_mid self.mid(x)x_mid self.mid_norm(x_mid)x self.act(x_mid)x self.drop(x)x self.fc2(x)x self.norm2(x)x self.drop(x)return xclass AttnFFN(nn.Module):def __init__(self, dim, mlp_ratio4.,act_layernn.ReLU, norm_layernn.LayerNorm,drop0., drop_path0.,use_layer_scaleTrue, layer_scale_init_value1e-5,resolution7, strideNone):super().__init__()self.token_mixer Attention4D(dim, resolutionresolution, act_layeract_layer, stridestride)mlp_hidden_dim int(dim * mlp_ratio)self.mlp Mlp(in_featuresdim, hidden_featuresmlp_hidden_dim,act_layeract_layer, dropdrop, mid_convTrue)self.drop_path DropPath(drop_path) if drop_path 0. \else nn.Identity()self.use_layer_scale use_layer_scaleif use_layer_scale:self.layer_scale_1 nn.Parameter(layer_scale_init_value * torch.ones(dim).unsqueeze(-1).unsqueeze(-1), requires_gradTrue)self.layer_scale_2 nn.Parameter(layer_scale_init_value * torch.ones(dim).unsqueeze(-1).unsqueeze(-1), requires_gradTrue)def forward(self, x):if self.use_layer_scale:x x self.drop_path(self.layer_scale_1 * self.token_mixer(x))x x self.drop_path(self.layer_scale_2 * self.mlp(x))else:x x self.drop_path(self.token_mixer(x))x x self.drop_path(self.mlp(x))return xclass FFN(nn.Module):def __init__(self, dim, pool_size3, mlp_ratio4.,act_layernn.GELU,drop0., drop_path0.,use_layer_scaleTrue, layer_scale_init_value1e-5):super().__init__()mlp_hidden_dim int(dim * mlp_ratio)self.mlp Mlp(in_featuresdim, hidden_featuresmlp_hidden_dim,act_layeract_layer, dropdrop, mid_convTrue)self.drop_path DropPath(drop_path) if drop_path 0. \else nn.Identity()self.use_layer_scale use_layer_scaleif use_layer_scale:self.layer_scale_2 nn.Parameter(layer_scale_init_value * torch.ones(dim).unsqueeze(-1).unsqueeze(-1), requires_gradTrue)def forward(self, x):if self.use_layer_scale:x x self.drop_path(self.layer_scale_2 * self.mlp(x))else:x x self.drop_path(self.mlp(x))return xdef eformer_block(dim, index, layers,pool_size3, mlp_ratio4.,act_layernn.GELU, norm_layernn.LayerNorm,drop_rate.0, drop_path_rate0.,use_layer_scaleTrue, layer_scale_init_value1e-5, vit_num1, resolution7, e_ratiosNone):blocks []for block_idx in range(layers[index]):block_dpr drop_path_rate * (block_idx sum(layers[:index])) / (sum(layers) - 1)mlp_ratio e_ratios[str(index)][block_idx]if index 2 and block_idx layers[index] - 1 - vit_num:if index 2:stride 2else:stride Noneblocks.append(AttnFFN(dim, mlp_ratiomlp_ratio,act_layeract_layer, norm_layernorm_layer,dropdrop_rate, drop_pathblock_dpr,use_layer_scaleuse_layer_scale,layer_scale_init_valuelayer_scale_init_value,resolutionresolution,stridestride,))else:blocks.append(FFN(dim, pool_sizepool_size, mlp_ratiomlp_ratio,act_layeract_layer,dropdrop_rate, drop_pathblock_dpr,use_layer_scaleuse_layer_scale,layer_scale_init_valuelayer_scale_init_value,))blocks nn.Sequential(*blocks)return blocksclass EfficientFormerV2(nn.Module):def __init__(self, layers, embed_dimsNone,mlp_ratios4, downsamplesNone,pool_size3,norm_layernn.BatchNorm2d, act_layernn.GELU,num_classes1000,down_patch_size3, down_stride2, down_pad1,drop_rate0., drop_path_rate0.,use_layer_scaleTrue, layer_scale_init_value1e-5,fork_featTrue,vit_num0,resolution640,e_ratiosexpansion_ratios_L,**kwargs):super().__init__()if not fork_feat:self.num_classes num_classesself.fork_feat fork_featself.patch_embed stem(3, embed_dims[0], act_layeract_layer)network []for i in range(len(layers)):stage eformer_block(embed_dims[i], i, layers,pool_sizepool_size, mlp_ratiomlp_ratios,act_layeract_layer, norm_layernorm_layer,drop_ratedrop_rate,drop_path_ratedrop_path_rate,use_layer_scaleuse_layer_scale,layer_scale_init_valuelayer_scale_init_value,resolutionmath.ceil(resolution / (2 ** (i 2))),vit_numvit_num,e_ratiose_ratios)network.append(stage)if i len(layers) - 1:breakif downsamples[i] or embed_dims[i] ! embed_dims[i 1]:# downsampling between two stagesif i 2:asub Trueelse:asub Falsenetwork.append(Embedding(patch_sizedown_patch_size, stridedown_stride,paddingdown_pad,in_chansembed_dims[i], embed_dimembed_dims[i 1],resolutionmath.ceil(resolution / (2 ** (i 2))),asubasub,act_layeract_layer, norm_layernorm_layer,))self.network nn.ModuleList(network)if self.fork_feat:# add a norm layer for each outputself.out_indices [0, 2, 4, 6]for i_emb, i_layer in enumerate(self.out_indices):if i_emb 0 and os.environ.get(FORK_LAST3, None):layer nn.Identity()else:layer norm_layer(embed_dims[i_emb])layer_name fnorm{i_layer}self.add_module(layer_name, layer)self.width_list [i.size(1) for i in self.forward(torch.randn(1, 3, resolution, resolution))]def forward_tokens(self, x):outs []for idx, block in enumerate(self.network):x block(x)if self.fork_feat and idx in self.out_indices:norm_layer getattr(self, fnorm{idx})x_out norm_layer(x)outs.append(x_out)return outsdef forward(self, x):x self.patch_embed(x)x self.forward_tokens(x)return xdef efficientformerv2_s0(pretrainedFalse, **kwargs):model EfficientFormerV2(layersEfficientFormer_depth[S0],embed_dimsEfficientFormer_width[S0],downsamples[True, True, True, True, True],vit_num2,drop_path_rate0.0,e_ratiosexpansion_ratios_S0,**kwargs)return modeldef efficientformerv2_s1(pretrainedFalse, **kwargs):model EfficientFormerV2(layersEfficientFormer_depth[S1],embed_dimsEfficientFormer_width[S1],downsamples[True, True, True, True],vit_num2,drop_path_rate0.0,e_ratiosexpansion_ratios_S1,**kwargs)return modeldef efficientformerv2_s2(pretrainedFalse, **kwargs):model EfficientFormerV2(layersEfficientFormer_depth[S2],embed_dimsEfficientFormer_width[S2],downsamples[True, True, True, True],vit_num4,drop_path_rate0.02,e_ratiosexpansion_ratios_S2,**kwargs)return modeldef efficientformerv2_l(pretrainedFalse, **kwargs):model EfficientFormerV2(layersEfficientFormer_depth[L],embed_dimsEfficientFormer_width[L],downsamples[True, True, True, True],vit_num6,drop_path_rate0.1,e_ratiosexpansion_ratios_L,**kwargs)return modelif __name__ __main__:inputs torch.randn((1, 3, 640, 640))model efficientformerv2_l()res model(inputs)for i in res:print(i.size()) 四、手把手教你添加EfficientformerV2 下面教大家如何修改该网络结构主干网络结构的修改步骤比较复杂我也会将task.py文件上传到CSDN的文件中大家如果自己修改不正确可以尝试用我的task.py文件替换你的然后只需要修改其中的第1、2、3、5步即可。
⭐修改过程中大家一定要仔细⭐ 4.1 修改一
首先我门中到如下“ultralytics/nn”的目录我们在这个目录下在创建一个新的目录名字为Addmodules此文件之后就用于存放我们的所有改进机制之后我们在创建的目录内创建一个新的py文件复制粘贴进去 可以根据文章改进机制来起这里大家根据自己的习惯命名即可。 4.2 修改二
第二步我们在我们创建的目录内创建一个新的py文件名字为__init__.py只需要创建一个即可然后在其内部导入我们本文的改进机制即可其余代码均为未发大家没有不用理会。 4.3 修改三
第三步我门中到如下文件ultralytics/nn/tasks.py然后在开头导入我们的所有改进机制如果你用了我多个改进机制这一步只需要修改一次即可。 4.4 修改四
添加如下两行代码
4.5 修改五
找到七百多行大概把具体看图片按照图片来修改就行添加红框内的部分注意没有()只是函数名此处我的文件里已经添加很多了后期都会发出来大家没有的不用理会即可。 elif m in {自行添加对应的模型即可下面都是一样的}:m m(*args)c2 m.width_list # 返回通道列表backbone True 4.6 修改六
用下面的代码替换红框内的内容。
if isinstance(c2, list):m_ mm_.backbone True
else:m_ nn.Sequential(*(m(*args) for _ in range(n))) if n 1 else m(*args) # modulet str(m)[8:-2].replace(__main__., ) # module type
m.np sum(x.numel() for x in m_.parameters()) # number params
m_.i, m_.f, m_.type i 4 if backbone else i, f, t # attach index, from index, type
if verbose:LOGGER.info(f{i:3}{str(f):20}{n_:3}{m.np:10.0f} {t:45}{str(args):30}) # print
save.extend(x % (i 4 if backbone else i) for x in ([f] if isinstance(f, int) else f) if x ! -1) # append to savelist
layers.append(m_)
if i 0:ch []
if isinstance(c2, list):ch.extend(c2)if len(c2) ! 5:ch.insert(0, 0)
else:ch.append(c2) 4.7 修改七
修改七这里非常要注意不是文件开头YOLOv8的那predict是400行的RTDETR的predict初始模型如下用我给的代码替换即可
代码如下- def predict(self, x, profileFalse, visualizeFalse, batchNone, augmentFalse, embedNone):Perform a forward pass through the model.Args:x (torch.Tensor): The input tensor.profile (bool, optional): If True, profile the computation time for each layer. Defaults to False.visualize (bool, optional): If True, save feature maps for visualization. Defaults to False.batch (dict, optional): Ground truth data for evaluation. Defaults to None.augment (bool, optional): If True, perform data augmentation during inference. Defaults to False.embed (list, optional): A list of feature vectors/embeddings to return.Returns:(torch.Tensor): Models output tensor.y, dt, embeddings [], [], [] # outputsfor m in self.model[:-1]: # except the head partif m.f ! -1: # if not from previous layerx y[m.f] if isinstance(m.f, int) else [x if j -1 else y[j] for j in m.f] # from earlier layersif profile:self._profile_one_layer(m, x, dt)if hasattr(m, backbone):x m(x)if len(x) ! 5: # 0 - 5x.insert(0, None)for index, i in enumerate(x):if index in self.save:y.append(i)else:y.append(None)x x[-1] # 最后一个输出传给下一层else:x m(x) # runy.append(x if m.i in self.save else None) # save outputif visualize:feature_visualization(x, m.type, m.i, save_dirvisualize)if embed and m.i in embed:embeddings.append(nn.functional.adaptive_avg_pool2d(x, (1, 1)).squeeze(-1).squeeze(-1)) # flattenif m.i max(embed):return torch.unbind(torch.cat(embeddings, 1), dim0)head self.model[-1]x head([y[j] for j in head.f], batch) # head inferencereturn x 4.8 修改八
我们将下面的s用640替换即可这一步也是部分的主干可以不修改但有的不修改就会报错所以我们还是修改一下。 4.9 RT-DETR不能打印计算量问题的解决
计算的GFLOPs计算异常不打印所以需要额外修改一处 我们找到如下文件ultralytics/utils/torch_utils.py文件内有如下的代码按照如下的图片进行修改大家看好函数就行其中红框的640可能和你的不一样 然后用我给的代码替换掉整个代码即可。 def get_flops(model, imgsz640):Return a YOLO models FLOPs.try:model de_parallel(model)p next(model.parameters())# stride max(int(model.stride.max()), 32) if hasattr(model, stride) else 32 # max stridestride 640im torch.empty((1, 3, stride, stride), devicep.device) # input image in BCHW formatflops thop.profile(deepcopy(model), inputs[im], verboseFalse)[0] / 1E9 * 2 if thop else 0 # stride GFLOPsimgsz imgsz if isinstance(imgsz, list) else [imgsz, imgsz] # expand if int/floatreturn flops * imgsz[0] / stride * imgsz[1] / stride # 640x640 GFLOPsexcept Exception:return 04.10 可选修改
有些读者的数据集部分图片比较特殊在验证的时候会导致形状不匹配的报错如果大家在验证的时候报错形状不匹配的错误可以固定验证集的图片尺寸方法如下 -
找到下面这个文件ultralytics/models/yolo/detect/train.py然后其中有一个类是DetectionTrainer class中的build_dataset函数中的一个参数rectmode val改为rectFalse 五、EfficientformerV2的yaml文件
5.1 yaml文件
大家复制下面的yaml文件然后通过我给大家的运行代码运行即可RT-DETR的调参部分需要后面的文章给大家讲现在目前免费给大家看这一部分不开放。
# Ultralytics YOLO , AGPL-3.0 license
# RT-DETR-l object detection model with P3-P5 outputs. For details see https://docs.ultralytics.com/models/rtdetr# Parameters
nc: 80 # number of classes
scales: # model compound scaling constants, i.e. modelyolov8n-cls.yaml will call yolov8-cls.yaml with scale n# [depth, width, max_channels]l: [1.00, 1.00, 1024]# 支持下面的版本均可替换
# [efficientformerv2_s0, efficientformerv2_s1, efficientformerv2_s2, efficientformerv2_l]
backbone:# [from, repeats, module, args]- [-1, 1, efficientformerv2_s0, []] # 4head:- [-1, 1, Conv, [256, 1, 1, None, 1, 1, False]] # 5 input_proj.2- [-1, 1, AIFI, [1024, 8]] # 6- [-1, 1, Conv, [256, 1, 1]] # 7, Y5, lateral_convs.0- [-1, 1, nn.Upsample, [None, 2, nearest]] # 8- [3, 1, Conv, [256, 1, 1, None, 1, 1, False]] # 9 input_proj.1- [[-2, -1], 1, Concat, [1]] # 10- [-1, 3, RepC3, [256, 0.5]] # 11, fpn_blocks.0- [-1, 1, Conv, [256, 1, 1]] # 12, Y4, lateral_convs.1- [-1, 1, nn.Upsample, [None, 2, nearest]] # 13- [2, 1, Conv, [256, 1, 1, None, 1, 1, False]] # 14 input_proj.0- [[-2, -1], 1, Concat, [1]] # 15 cat backbone P4- [-1, 3, RepC3, [256, 0.5]] # X3 (16), fpn_blocks.1- [-1, 1, Conv, [256, 3, 2]] # 17, downsample_convs.0- [[-1, 12], 1, Concat, [1]] # 18 cat Y4- [-1, 3, RepC3, [256, 0.5]] # F4 (19), pan_blocks.0- [-1, 1, Conv, [256, 3, 2]] # 20, downsample_convs.1- [[-1, 7], 1, Concat, [1]] # 21 cat Y5- [-1, 3, RepC3, [256, 0.5]] # F5 (22), pan_blocks.1- [[16, 19, 22], 1, RTDETRDecoder, [nc, 256, 300, 4, 8, 3]] # Detect(P3, P4, P5) 5.2 运行文件
大家可以创建一个train.py文件将下面的代码粘贴进去然后替换你的文件运行即可开始训练。
import warnings
from ultralytics import RTDETR
warnings.filterwarnings(ignore)if __name__ __main__:model RTDETR(替换你想要运行的yaml文件)# model.load() # 可以加载你的版本预训练权重model.train(datar替换你的数据集地址即可,cacheFalse,imgsz640,epochs72,batch4,workers0,device0,projectruns/RT-DETR-train,nameexp,# ampTrue) 5.3 成功训练截图
下面是成功运行的截图确保我的改进机制是可用的已经完成了有1个epochs的训练图片太大截不全第2个epochs了。 六、全文总结
从今天开始正式开始更新RT-DETR剑指论文专栏本专栏的内容会迅速铺开在短期呢大量更新价格也会乘阶梯性上涨所以想要和我一起学习RT-DETR改进可以在前期直接关注本文专栏旨在打造全网最好的RT-DETR专栏为想要发论文的家进行服务。 专栏链接RT-DETR剑指论文专栏持续复现各种顶会内容——论文收割机RT-DETR