期刊文献+

图文数据的多级关系分析与挖掘方法 被引量:1

Multilevel relation analysis and mining method of image-text
下载PDF
导出
摘要 如何高效挖掘多模态数据间隐藏的语义关联是当前多模态知识抽取的重点任务之一,为更细粒度地挖掘图像与文本数据间关系,提出了一种多级关系分析与挖掘(MRAM)方法,引入BERT-Large模型,提取文本特征构建文本连接图,利用Faster-RCNN网络提取图像特征来学习空间位置关系和语义关系并构建图像连接图,进而完成单模态内部语义关系计算,在此基础上,使用节点切分方法和带多头注意力机制的图卷积网络(GCN-MA)进行局部和全局的图文关系融合。此外,为提升关系挖掘效率,采用了基于注意力机制的连边权重剪枝策略,用以增强重要分支表示,减少冗余信息干扰。在公开的Flickr30K、MSCOCO-1K、MSCOCO-5K数据集上进行方法实验,并与11种方法进行实验结果的对比分析,所提方法在Flickr30K上的平均召回率提高了0.97%和0.57%,在MSCOCO-1K上的平均召回率提高了0.93%和0.63%,在MSCOCO-5K上的平均召回率提高了0.37%和0.93%,实验结果验证了所提方法的有效性。 How to efficiently mine the hidden semantic association between multi-modal data is one of the key tasks of multi-modal knowledge extraction.In order to mine fine-grained relation between image and text,multilevel relation analysis and mining method of image-text(MRAM)was proposed.BERT-Large(bidirectional encoder representation from transformers-large)extracted text feature and constructed text connection graphs,while the Faster-RCNN network extracted image feature to learn spatial position relation and semantic relation,then constructed image connection graphs,so as to complete the calculation of single-modal internal semantic relation.The node segmentation method and graph convolutional network with multi-head attention(GCN-MA)fused local and global relation of text and image.To improve the efficiency of relation mining,edge weight pruning strategy based on the attention mechanism strengthened the representation of important branches,and reduced the interference of redundant information.The proposed method was tested on Flickr30K,MSCOCO-1K and MSCOCO-5K datasets,and was compared with 11 methods.The average recall rate on Flickr30K was increased by 0.97%and 0.57%,the average recall rate on MSCOCO-1K was increased by 0.93%and 0.63%,and the average recall rate on MSCOCO-5K was increased by 0.37%and 0.93%.Experimental results verify the effectiveness of the proposed method.
作者 郭瑞萍 王海荣 王栋 GUO Ruiping;WANG Hairong;WANG Dong(School of Computer Science and Engineering,North Minzu University,Yinchuan 750021,China)
出处 《北京航空航天大学学报》 EI CAS CSCD 北大核心 2024年第2期684-694,共11页 Journal of Beijing University of Aeronautics and Astronautics
基金 宁夏自然科学基金项目(2023AAC03316) 宁夏回族自治区教育厅高等学校科学研究重点项目(NYG2022051) 北方民族大学科研项目(2021XYZJK06)。
关键词 关系挖掘 多级关系 注意力机制 图卷积网络 图文数据 relation mining multilevel relation attention mechanism graph convolutional network image-text data
  • 相关文献

参考文献1

二级参考文献2

共引文献1

引证文献1

相关作者

内容加载中请稍等...

相关机构

内容加载中请稍等...

相关主题

内容加载中请稍等...

浏览历史

内容加载中请稍等...
;
使用帮助 返回顶部