摘要
目的 特征融合是改善模糊图像、小目标以及受遮挡物体等目标检测困难的有效手段之一,为了更有效地利用特征融合来整合不同网络层次的特征信息,显著表达其中的重要特征,本文提出一种基于融合策略优选和双注意力机制的单阶段目标检测算法FDA-SSD(fusion double attention single shot multibox detector)。方法 设计融合策略优化选择方法,结合特征金字塔(feature pyramid network, FPN)来确定最优的多层特征图组合及融合过程,之后连接双注意力模块,通过对各个通道和空间特征的权重再分配,提升模型对通道特征和空间信息的敏感性,最终产生包含丰富语义信息和凸显重要特征的特征图组。结果 本文在公开数据集PASCAL VOC2007(pattern analysis, statistical modelling and computational learning visual object classes)和TGRS-HRRSD-Dataset(high resolution remote sensing detection)上进行对比实验,结果表明,在输入为300×300像素的PASCAL VOC2007测试集上,FDA-SSD模型的精度达到79.8%,比SSD(single shot multibox detector)、RSSD(rainbow SSD)、DSSD(de-convolution SSD)、FSSD(feature fusion SSD)模型分别高了2.6%、1.3%、1.2%、1.0%,在Titan X上的检测速度为47帧/s(frame per second, FPS),与SSD算法相当,分别高于RSSD和DSSD模型12 FPS和37.5 FPS。在输入像素为300×300的TGRS-HRRSD-Dataset测试集上的精度为84.2%,在Tesla V100上的检测速度高于SSD模型10%的情况下,准确率提高了1.5%。结论 通过在单阶段目标检测模型中引入融合策略选择和双注意力机制,使得预测的速度和准确率同时得到提升,并且对于小目标、受遮挡以及模糊图像等难目标的检测能力也得到较大提升。
Objective Object detection is essential to computer vision and in-depth learning recently. It has been widely used in industrial detection, intelligent transportation, human facial recognition and contexts. There are two main categories of recognized target detection algorithms. One of current target detection algorithms is two-stage algorithm, such as region-based convolution neural network(R-CNN), Fast R-CNN, online hard example mining(OHEM), Faster R-CNN, Mask R-CNN etc. The methods generate target candidate boxes first, and implement the candidate boxes classification and regression following. The other one is single-stage algorithms, such as you only look once(YOLO), single shot multibox detector(SSD) etc. In addition, the demonstrated corner network(CornerNet) & center network(CenterNet)-anchor free models have tried to ignore the anchor frame and conduct detection and matching based on key points, which has achieved quite good results, but there is still a little gap from the detection method based on anchor frame. In the practical application of single-stage target detection, a main challenging issue is target detection like blurred image, small target and occluded object, and the predicted performance and efficiency. Feature fusion can improve the detection ability of difficult targets effectively by fusing different deep and shallow features of the network, which has been used in many improved SSD models in common. However, most of the improved models use feature fusion methods directly, and the specific fusion strategies like the issues of fused graphs option and fused graphs processing. In addition, current attention mechanism can make the feature graph have a certain “focus” effect by giving dimension weight. The issue of combining attention mechanism to single-stage target detection effectively has its potentials. Method The shallow Visual Geometry Group(VGG) network in the original SSD algorithm is replaced by the deep residual network as the backbone network. First, an optimized selection meth
作者
戴坤
许立波
黄世旸
李鋆铃
Dai Kun;Xu Libo;Huang Shiyang;Li Yunling(School of Computer and Data Enginering,NingboTech Unixersity,Ningbo 315000,China)
出处
《中国图象图形学报》
CSCD
北大核心
2022年第8期2430-2443,共14页
Journal of Image and Graphics
基金
国家自然科学基金项目(61872321)
宁波市科技创新2025重大专项项目(2019B10036,2020Z005)。
关键词
单阶段目标检测
SSD算法
特征金字塔(FPN)
特征融合
注意力机制
single-stage object detection
single shot multibox detector(SSD)
feature pyramid network(FPN)
feature fusion
attention mechanism