期刊文献+

动态模态交互和特征自适应融合的RGBT跟踪 被引量:1

RGBT tracking based on dynamic modal interaction and adaptive feature fusion
原文传递
导出
摘要 目的可见光和热红外模态数据具有很强的互补性,RGBT(RGB-thermal)跟踪受到越来越多的关注。传统RGBT目标跟踪方法只是将两个模态的特征进行简单融合,跟踪的性能受到一定程度的限制。本文提出了一种基于动态交互和融合的方法,协作学习面向RGBT跟踪的模态特定和互补表示。方法首先,不同模态的特征进行交互生成多模态特征,在每个模态的特定特征学习中使用注意力机制来提升判别性。其次,通过融合不同层次的多模态特征来获得丰富的空间和语义信息,并通过设计一个互补特征学习模块来进行不同模态互补特征的学习。最后,提出一个动态权重损失函数,根据对两个模态特定分支预测结果的一致性和不确定性进行约束以自适应优化整个网络中的参数。结果在两个基准RGBT目标跟踪数据集上进行实验,数据表明,在RGBT234数据集上,本文方法的精确率(precision rate,PR)为79.2%,成功率(success rate,SR)为55.8%;在GTOT(grayscale-thermal object tracking)数据集上,本文方法的精确率为86.1%,成功率为70.9%。同时也在RGBT234和GTOT数据集上进行了对比实验以验证算法的有效性,实验结果表明本文方法改善了RGBT目标跟踪的结果。结论本文提出的RGBT目标跟踪算法,有效挖掘了两个模态之间的互补性,取得了较好的跟踪精度。 Objective Visual target tracking can be applied to the computer vision analysis,such as video surveillance,unmanned autopilot systems,and human-computer interaction.Thermal infrared cameras have the advantages of long-range of action,strong penetrating ability,hidden objects.As a branch of visual tracking,RGBT(RGB-thermal)tracking aims to estimate the status of the target in a video sequence by aggregating complementary data from two different modalities given the groundtruth bounding box of the first frame of the video sequence.Previous RGBT tracking algorithms are constrained of traditional handcraft features or insufficient to explore and utilize complementary information from different modalities.In order to explore the complementary information between the two modalities,we propose a dynamic interaction and fusion method for RGBT tracking.Method Generally,RGB images capture visual appearance information(e.g.,colors and textures)of target,and thermal images acquire temperature information which is robust to the conditions of lighting and background clutter.To obtain more powerful representations,we can introduce the useful information of another modality.However,the fusion of different modalities is opted from addition or concatenation in common due to some noisy information of the obtained modality features.First,a modality interaction module is demonstrated to suppress clutter noise based on the multiplication operation.Second,a fusion module is designed to gather cross-modality features of all layers.It captures different abstractions of target representations for more accurate localization.Third,a complementary gate mechanism guided learning structure calculates the complementary features of different modalities.As the input of the gate,we use the modality-specific features and the cross-modality features obtained from the fusion module.The output of the gate is a numerical value.To obtain the complementary features,we carry out a dot product operation on this value and the cross-modality features.Finally
作者 王福田 张淑云 李成龙 罗斌 Wang Futian;Zhang Shuyun;Li Chenglong;Luo Bin(Anhui Provincial Key Laboratory of Multimodal Cognitive Computation,School of Computer Science and Technology,Anhui University,Hefei 230000,China;Institute of Artificial Intelligence,Hefei Comprehensive National Science Center,Hefei 230000,China)
出处 《中国图象图形学报》 CSCD 北大核心 2022年第10期3010-3021,共12页 Journal of Image and Graphics
基金 国家自然科学基金项目(62076003) 安徽高校协同创新项目(GXXT-2019-007) 安徽省自然科学基金项目(1908085MF206)。
关键词 模态交互 模态融合 互补特征学习 模态特定信息 RGBT目标跟踪 modality interaction modality fusion complementary features learning modality-specific information RGBT object tracking
  • 相关文献

参考文献1

二级参考文献1

共引文献24

同被引文献21

引证文献1

相关作者

内容加载中请稍等...

相关机构

内容加载中请稍等...

相关主题

内容加载中请稍等...

浏览历史

内容加载中请稍等...
;
使用帮助 返回顶部