摘要
目标检测网络层数越多、参数规模越大,其精度越高,但对于低算力的边缘端AI芯片来说,部署超大规模参数量的网络,无法达到实时性的要求。为此,文中基于YOLOv5,提出一种面向AI芯片的轻量化的YOLOv5_RepVGG目标检测算法。首先对YOLOv5的骨干网络进行改进,设计RepVGG_X模块结构,在训练时通过3×3卷积、1×1卷积和直连三种分支结构提取图像特征;在网络推理时通过结构重参数化将1×1卷积和直连与3×3卷积进行融合,最终形成一个3×3的单分支结构。然后对YOLOv5的输出层进行改进,充分利用骨干网络中6次降采样的多尺度信息,输出4种尺度的特征图。最后将设计的轻量化网络部署在国产AI芯片Hi3559AV100上并进行验证。实验结果表明,与传统YOLOv5相比,当网络精度仅下降3个点时,所提算法在AI芯片上的推理时间降到18.6 ms,速度提升近1倍,可满足日益增长的边缘场景AI计算任务实时性的要求。
The more layers of target detection network and the larger the parameter scale,the higher the accuracy will be.However,for the edge AI chip with low computing power,the network with a large number of parameters cannot meet the real⁃time requirements.Therefore,a lightweight YOLOv5_RepVGG target detection algorithm for AI chip is proposed based on YOLOv5.The backbone network of YOLOv5 is improved,and the RepVGG_X module structure is designed.During training,image features are extracted by means of three branch structures:3×3 convolution,1×1 convolution and direct connection.During network reasoning,1×1 convolution and direct connection are fused with 3×3 convolution by means of structural reparameterization to form a 3×3 single branch structure.The output layer of YOLOv5 is improved to make full use of the multi⁃scale information of 6 downsamplinies in the backbone network,and the feature maps with 4 scales are output.The designed lightweight network is deployed and verified on the domestic AI chip Hi3559AV100.The experimental results show that in comparison with the traditional YOLOv5,the reasoning time of the proposed algorithm on the AI chip is reduced to 18.6 ms,and the speed is nearly doubled,which can meet the growing demand of real⁃time AI computing tasks in edge scenes.
作者
曹朋军
傅哲
CAO Pengjun;FU Zhe(Institute of Automatic Control,Xi’an Jiaotong University,Xi’an 710000,China)
出处
《现代电子技术》
2023年第6期169-174,共6页
Modern Electronics Technique