Stacked Attention Networks for Referring Expressions Comprehension

下载PDF

导出

摘要 Referring expressions comprehension is the task of locating the image region described by a natural language expression,which refer to the properties of the region or the relationships with other regions.Most previous work handles this problem by selecting the most relevant regions from a set of candidate regions,when there are many candidate regions in the set these methods are inefficient.Inspired by recent success of image captioning by using deep learning methods,in this paper we proposed a framework to understand the referring expressions by multiple steps of reasoning.We present a model for referring expressions comprehension by selecting the most relevant region directly from the image.The core of our model is a recurrent attention network which can be seen as an extension of Memory Network.The proposed model capable of improving the results by multiple computational hops.We evaluate the proposed model on two referring expression datasets:Visual Genome and Flickr30k Entities.The experimental results demonstrate that the proposed model outperform previous state-of-the-art methods both in accuracy and efficiency.We also conduct an ablation experiment to show that the performance of the model is not getting better with the increase of the attention layers.

作者 Yugang Li Haibo Sun Zhe Chen Yudan Ding Siqi Zhou

机构地区 Academy of Broadcasting Science School of Electrical and Electronic Engineering

出处《Computers, Materials & Continua》 SCIE EI 2020年第12期2529-2541,共13页 计算机、材料和连续体（英文）

基金 This work was supported in part by audio-visual new media laboratory operation and maintenance of Academy of Broadcasting Science,Grant No.200304 in part by the National Key Research and Development Program of China(Grant No.2019YFB1406201).

关键词 Stacked attention networks referring expressions visual relationship deep learning

分类号 TP3 [自动化与计算机技术—计算机科学与技术]

引文网络
相关文献

1耿艳利,宋朋首,林彦伯,季燕凯,杨淑才.采用改进CNN对生猪异常状态声音识别[J].农业工程学报,2021,37(20):187-193. 被引量：10
2Xiaochuan ZHANG,Xipeng QIU,Jianmin PANG,Fudong LIU,Xingwei LI.Dual-axial self-attention network for text classification[J].Science China(Information Sciences),2021,64(12):76-86. 被引量：5
3Zhetao LI,Ziwen CHEN,Wei-Shi ZHENG,Sangyoon OH,Kien NGUYEN.AR-CNN:an attention ranking network for learning urban perception[J].Science China(Information Sciences),2022,65(1):160-170.
4陆瑶,杨洁,邵智娟,朱聪聪.基于阶段式时序注意力网络的PM_(2.5)鲁棒预测[J].环境工程,2021,39(10):93-100. 被引量：1
5Fanfan WU,Feihu YAN,Weimin SHI,Zhong ZHOU.3D scene graph prediction from point clouds[J].Virtual Reality & Intelligent Hardware,2022,4(1):76-88.
6Jianbin Zhou,Jin Ben,Rui Wang,Mingyang Zheng,Xiaochuang Yao,Lingyu Du.A novel method of determining the optimal polyhedral orientation for discrete global grid systems applicable to regionalscale areas of interest[J].International Journal of Digital Earth,2020,13(12):1553-1569. 被引量：2
7Alessia Suprano,Danilo Zia,Emanuele Polino,Taira Giordani,Luca Innocenti,Alessandro Ferraro,Mauro Paternostro,Nicolo Spagnolo,Fabio Sciarrino.Dynamical learning of a photonics quantum-state engineering process[J].Advanced Photonics,2021,3(6):48-58. 被引量：1
8Cui Zhao,Wei-Jie Huang,Feng Feng,Bo Zhou,Hong-Xiang Yao,Yan-E Guo,Pan Wang,Lu-Ning Wang,Ni Shu,Xi Zhang.Abnormal characterization of dynamic functional connectivity in Alzheimer’s disease[J].Neural Regeneration Research,2022,17(9):2014-2021. 被引量：8

Computers, Materials & Continua

2020年第12期

浏览历史

内容加载中请稍等...

Stacked Attention Networks for Referring Expressions Comprehension

相关作者

相关机构

相关主题

浏览历史