摘要
为了快速精准地识别复杂果园环境下的葡萄目标,该研究基于YOLOv5s提出一种改进的葡萄检测模型(MRWYOLOv5s)。首先,为了减少模型参数量,采用轻量型网络MobileNetv3作为特征提取网络,并在MobileNetv3的bneck结构中嵌入坐标注意力模块(coordinate attention,CA)以加强网络的特征提取能力;其次,在颈部网络中引入RepVGG Block,融合多分支特征提升模型的检测精度,并利用RepVGG Block的结构重参数化进一步加快模型的推理速度;最后,采用基于动态非单调聚焦机制的损失(wise intersection over union loss,WIoU Loss)作为边界框回归损失函数,加速网络收敛并提高模型的检测准确率。结果表明,改进的MRW-YOLOv5s模型参数量仅为7.56 M,在测试集上的平均精度均值(mean average precision,mAP)达到97.74%,相较于原YOLOv5s模型提升了2.32个百分点,平均每幅图片的检测时间为10.03 ms,比原YOLOv5s模型减少了6.13 ms。与主流的目标检测模型SSD、RetinaNet、YOLOv4、YOLOv7和YOLOX相比,MRW-YOLOv5s模型的mAP分别高出9.89、7.53、2.12、0.91、2.42个百分点,并且在模型参数量大小和检测速度方面有着很大的优势,该研究可为果园智能化、采摘机械化提供技术支持。
Grape has been one of the most popular fruits with great nutritional value and economic benefits.Manual picking of mature grapes cannot fully meet the large-scale production in recent years,particularly with the expansion of planting areas.A picking robot can be expected to monitor the growth of grapes in orchards in real time.Automatic grape picking can also be promoted to realize intelligent agricultural production.In this study,an improved YOLOv5s model(MRW-YOLOv5s)was proposed to rapidly and accurately identify the grapes in orchards.Firstly,the lightweight network MobileNetv3 was used as the feature extraction network,in order to reduce the amount of model parameters.A coordinate attention module(CA)was also embedded into the bneck structure of MobileNetv3 to strengthen the feature extraction capability of the network.Secondly,RepVGG Block was introduced into the neck network,where the multi-branch features were integrated to improve the detection accuracy of the model.Moreover,the structural reparameterization of the RepVGG Block was implemented to further accelerate the inference speed of the model.Finally,Wise Intersection over Union Loss(WIoU Loss)with the dynamic nonmonotonic focusing mechanism was taken as the bounding box regression loss function,in order to accelerate the network convergence for the better detection accuracy of the model.Gradient-weighted class activation mapping(Grad-CAM)was also selected to capture the grape targets when the backbone network of the improved model was embedded with the CA module.A better performance was then achieved,compared with the model embedded with Efficient Channel Attention(ECA)and Convolutional Block Attention Module(CBAM).In addition,there was the lowest speed of bounding box loss regression in the convergence curve of the loss function,while the highest loss value after convergence,where the EIoU was the bounding box loss function.Once the CIoU and Wise-IoU v1 were taken as the bounding box loss functions,there were similar convergence speeds and loss val
作者
孙俊
吴兆祺
贾忆琳
宫东见
武小红
沈继锋
SUN Jun;WU Zhaoqi;JIA Yilin;GONG Dongjian;WU Xiaohong;SHEN Jifeng(School of Electrical and Information Engineering,Jiangsu University,Zhenjiang 212013,China)
出处
《农业工程学报》
EI
CAS
CSCD
北大核心
2023年第18期192-200,共9页
Transactions of the Chinese Society of Agricultural Engineering
基金
国家自然科学基金面上项目(31971788)
江苏大学农业装备学部项目(NZXB20210210)。