期刊文献+

基于双门控-残差特征融合的跨模态图文检索

Dual Gating-Residual Feature Fusion for Image-Text Cross-modal Retrieval
下载PDF
导出
摘要 由于互联网和社交媒体的快速发展,跨模态检索引起了广泛关注,跨模态检索学习的目的是实现不同模态的灵活检索。不同模态数据之间存在异质性差距,不能直接计算不同模态特征的相似度,使得跨模态检索任务的准确率很难提高。为缩小图像和文本数据间的异质性差距,文中提出了一种双门控-残差特征融合的跨模态图文检索方法(DGRFF),该方法通过设计门控特征和残差特征来融合图像模态和文本的特征,能够从相反的模态中获得更有效的特征信息,使得语义特征信息更全面。同时,采用对抗损失来对齐两个模态特征的分布,以保持融合特征模态不变性以及在公共潜在空间中得到更有辨识力的特征表示。最后,联合标签预测损失、跨模态相似性损失和对抗损失对模型进行训练学习。在Wikipedia和Pascal Sentence数据集上进行实验,结果证明,DGRFF在跨模态检索任务上获得了良好的效果。 Due to the rapid development of the Internet and social media,cross-modal retrieval has attracted extensive attention.The purpose of cross-modal retrieval is to achieve flexible retrieval of different modalities.The heterogeneity gap between diffe-rent modal suggests that the similarity of different modal features cannot be calculated directly,making it difficult to improve the accuracy of cross-modal retrieval.This paper proposes an image-text cross-modal retrieval method for dual gating-residual feature fusion(DGRFF),to narrow the heterogeneity gap between the image and text.By designing gating features and residual features to fusion the features of image modality and text modality,this method can gain more effective feature information from the opposite modality,making semantic feature information more comprehensive.At the same time,the adversarial loss is adopted to align the feature distribution of the two modalities,to maintain the modality invariance of the fusion feature and obtain a more recogni-zable feature representation in the public potential space.Finally,the model is trained by combining label prediction loss,cross-modal similarity loss and adversarial loss.Experiments on Wikipedia and Pascal Sentence datasets show that DGRFF performs well on cross-modal retrieval tasks.
作者 张昌凡 马远远 刘建华 何静 ZHANG Changfan;MA Yuanyuan;LIU Jianhua;HE Jing(School of Electrical and Information Engineering,Hunan University of Technology,Zhuzhou,Hunan 412007,China;School of Rail Transit,Hunan University of Technology,Zhuzhou,Hunan 412007,China)
出处 《计算机科学》 CSCD 北大核心 2023年第S01期481-487,共7页 Computer Science
基金 国家自然科学基金(52172403,62173137,52272347) 湖南省自然科学基金(2021JJ50001,2021JJ30217)。
关键词 跨模态检索 异质性差距 门控特征 残差特征 特征融合 Cross-modal retrieval Heterogeneity gap Gating features Residual features Feature fusion
  • 相关文献

相关作者

内容加载中请稍等...

相关机构

内容加载中请稍等...

相关主题

内容加载中请稍等...

浏览历史

内容加载中请稍等...
;
使用帮助 返回顶部