摘要
目的跨媒体检索旨在以任意媒体数据检索其他媒体的相关数据,实现图像、文本等不同媒体的语义互通和交叉检索。然而,"异构鸿沟"导致不同媒体数据的特征表示不一致,难以实现语义关联,使得跨媒体检索面临巨大挑战。而描述同一语义的不同媒体数据存在语义一致性,且数据内部蕴含着丰富的细粒度信息,为跨媒体关联学习提供了重要依据。现有方法仅仅考虑了不同媒体数据之间的成对关联,而忽略了数据内细粒度局部之间的上下文信息,无法充分挖掘跨媒体关联。针对上述问题,提出基于层级循环注意力网络的跨媒体检索方法。方法首先提出媒体内—媒体间两级循环神经网络,其中底层网络分别建模不同媒体内部的细粒度上下文信息,顶层网络通过共享参数的方式挖掘不同媒体之间的上下文关联关系。然后提出基于注意力的跨媒体联合损失函数,通过学习媒体间联合注意力来挖掘更加精确的细粒度跨媒体关联,同时利用语义类别信息增强关联学习过程中的语义辨识能力,从而提升跨媒体检索的准确率。结果在2个广泛使用的跨媒体数据集上,与10种现有方法进行实验对比,并采用平均准确率均值MAP作为评价指标。实验结果表明,本文方法在2个数据集上的MAP分别达到了0. 469和0. 575,超过了所有对比方法。结论本文提出的层级循环注意力网络模型通过挖掘图像和文本的细粒度信息,能够充分学习图像和文本之间精确跨媒体关联关系,有效地提高了跨媒体检索的准确率。
Objective Cross-media retrieval aims to retrieve the data of different media types by a query, which can provide flexible and useful retrieval experience with numerous user demands at present. However, a "heterogeneity gap" leads to inconsistent representations of different media types, thus resulting in a challenging construction of coirelation and realizing cross-media retrieval between them. However, data from different media types naturally have a semantic consistency, and their patches contain abundant fine-grained information, which provides key clues for cross-media correlation learning. Ex- isting methods mostly consider a pairwise correlation of various media types with the same semantics, but they ignore the context information among the fine-grained patches, which cannot fully capture the cross-media correlation. To address this problem, a cross-media hierarchical recurrent attention network (CHRAN) is proposed to fully consider the intra - and in- ter-media fine-grained context information. Method First, we propose to construct a hierarchical recurrent network to fully exploit the cross-media fine-grained context information. Specifically, the hierarchical recurrent network consists of two lev- els, which are implemented by a long short-term memory network. We extract features from the fine-grained patches of dif-ferent media types and organize them into sequences, which are considered the inputs of the hierarchical network. The bot- tom level aims to model the intra-media fine-grained context information, whereas the top level adopts a weight-sharing con- straint to fully exploit inter-media context correlation, which aims to share the knowledge learned from different media types. Thus, the hierarchical recurrent network can provide intra - and inter-media fine-grained hints for boosting cross- media correlation learning. Second, we propose an attention-based cross-media joint embedding loss to learn a cross-media correlation. We utilize an attention mechanism to allow the models to focus on
作者
綦金玮
彭宇新
袁玉鑫
Qi Jinwei;Peng Yuxin;Yuan Yuxin(Institute of Computer Science and Technology,Peking University,Beijing 100080,China)
出处
《中国图象图形学报》
CSCD
北大核心
2018年第11期1751-1758,共8页
Journal of Image and Graphics
基金
国家自然科学基金项目(61771025
61532005)~~
关键词
跨媒体检索
注意力机制
循环神经网络
关联学习
语义辨识
cross-media retrieval
attention mechanism
recurrent network
correlation learning
semantic discrimination