摘要
基于软注意力机制的图像描述算法,提出类激活映射-注意力机制的图像描述方法。利用类激活映射算法得到卷积特征包含定位以及更丰富的语义信息,使得卷积特征与图像描述具有更好的对应关系,解决卷积特征与图像描述的对齐问题,生成的自然语言描述能够尽可能完整的描述图像内容。选择双层长短时记忆网络改进注意力机制结构,使得新的注意力机制适合当前全局和局部信息的特征表示,能够选取合适的特征表示生成图像描述。试验结果表明,改进模型在诸多评价指标上优于软注意力机制等模型,其中在MSCOCO数据集上Bleu-4的评价指标相较于软注意力模型提高了16.8%。类激活映射机制可以解决图像空间信息与描述语义对齐的问题,使得生成的自然语言减少丢失关键信息,提高图像描述的准确性。
Class activation mapping-attention mechanism was introduced to soft attention based image caption framework.The class activation mapping mechanism introduced the position information to convolutional features with richer semantic information,where there was a better alignment between convolutional features and description words,so that the generated description could describe the image content more completely.Improved the attention mechanism with double layer of long short-term memory network made the attention mechanism suitable for global and local information for generating words with specific features.The experiments showed that the improved model could generate more accurate description and outperformed the performance of models such as the soft attention mechanism in many evaluation criteria,specially the bleu-4 result on the MSCOCO dataset increased 16.8%compared with the soft attention-based model,which showed class activation mapping-attention could align the word and the convolutional feature,and generate more accurate descriptions with less key information lost.
作者
廖南星
周世斌
张国鹏
程德强
LIAO Nanxing;ZHOU Shibin;ZHANG Guopeng;CHENG Deqiang(School of Computer Science and Technology,China University of Mining and Technology,Xuzhou 221116,Jiangsu,China;Sun Yueqi Honors College,China University of Mining and Technology,Xuzhou 221116,Jiangsu,China)
出处
《山东大学学报(工学版)》
CAS
CSCD
北大核心
2020年第4期28-34,共7页
Journal of Shandong University(Engineering Science)
基金
国家自然科学基金资助项目(61971421)。
关键词
图像描述
注意力机制
类激活映射
卷积神经网络
循环神经网络
image caption
attention mechanism
class activation mapping
convolutional neural network
recurrent neural network