摘要
目的可见光和热红外模态数据具有很强的互补性,RGBT(RGB-thermal)跟踪受到越来越多的关注。传统RGBT目标跟踪方法只是将两个模态的特征进行简单融合,跟踪的性能受到一定程度的限制。本文提出了一种基于动态交互和融合的方法,协作学习面向RGBT跟踪的模态特定和互补表示。方法首先,不同模态的特征进行交互生成多模态特征,在每个模态的特定特征学习中使用注意力机制来提升判别性。其次,通过融合不同层次的多模态特征来获得丰富的空间和语义信息,并通过设计一个互补特征学习模块来进行不同模态互补特征的学习。最后,提出一个动态权重损失函数,根据对两个模态特定分支预测结果的一致性和不确定性进行约束以自适应优化整个网络中的参数。结果在两个基准RGBT目标跟踪数据集上进行实验,数据表明,在RGBT234数据集上,本文方法的精确率(precision rate,PR)为79.2%,成功率(success rate,SR)为55.8%;在GTOT(grayscale-thermal object tracking)数据集上,本文方法的精确率为86.1%,成功率为70.9%。同时也在RGBT234和GTOT数据集上进行了对比实验以验证算法的有效性,实验结果表明本文方法改善了RGBT目标跟踪的结果。结论本文提出的RGBT目标跟踪算法,有效挖掘了两个模态之间的互补性,取得了较好的跟踪精度。
Objective Visual target tracking can be applied to the computer vision analysis,such as video surveillance,unmanned autopilot systems,and human-computer interaction.Thermal infrared cameras have the advantages of long-range of action,strong penetrating ability,hidden objects.As a branch of visual tracking,RGBT(RGB-thermal)tracking aims to estimate the status of the target in a video sequence by aggregating complementary data from two different modalities given the groundtruth bounding box of the first frame of the video sequence.Previous RGBT tracking algorithms are constrained of traditional handcraft features or insufficient to explore and utilize complementary information from different modalities.In order to explore the complementary information between the two modalities,we propose a dynamic interaction and fusion method for RGBT tracking.Method Generally,RGB images capture visual appearance information(e.g.,colors and textures)of target,and thermal images acquire temperature information which is robust to the conditions of lighting and background clutter.To obtain more powerful representations,we can introduce the useful information of another modality.However,the fusion of different modalities is opted from addition or concatenation in common due to some noisy information of the obtained modality features.First,a modality interaction module is demonstrated to suppress clutter noise based on the multiplication operation.Second,a fusion module is designed to gather cross-modality features of all layers.It captures different abstractions of target representations for more accurate localization.Third,a complementary gate mechanism guided learning structure calculates the complementary features of different modalities.As the input of the gate,we use the modality-specific features and the cross-modality features obtained from the fusion module.The output of the gate is a numerical value.To obtain the complementary features,we carry out a dot product operation on this value and the cross-modality features.Finally
作者
王福田
张淑云
李成龙
罗斌
Wang Futian;Zhang Shuyun;Li Chenglong;Luo Bin(Anhui Provincial Key Laboratory of Multimodal Cognitive Computation,School of Computer Science and Technology,Anhui University,Hefei 230000,China;Institute of Artificial Intelligence,Hefei Comprehensive National Science Center,Hefei 230000,China)
出处
《中国图象图形学报》
CSCD
北大核心
2022年第10期3010-3021,共12页
Journal of Image and Graphics
基金
国家自然科学基金项目(62076003)
安徽高校协同创新项目(GXXT-2019-007)
安徽省自然科学基金项目(1908085MF206)。
关键词
模态交互
模态融合
互补特征学习
模态特定信息
RGBT目标跟踪
modality interaction
modality fusion
complementary features learning
modality-specific information
RGBT object tracking