摘要
图像描述是一项融合了自然语言处理和计算机视觉的综合任务,现有方法不仅存在描述性能不佳、缺失语义信息等问题,还存在模型结构与图像特征之间语义信息关联性不足的问题。针对这些问题,提出一种使用门控循环单元和卷积注意力模块进行优化的基于多模态神经网络的图像描述方法。为了验证方法的有效性,在MSCOCO2014数据集上进行实验对比,结果表明,改进方法在各项评价标准下的性能均优于原方法和其他经典算法,并且能够更好地处理图像里的关键信息和生成更加准确的图像描述句子。
Image captioning is a comprehensive task combining natural language processing and computer vision.However,existing methods not only have problems such as poor description performance and missing semantic information,but also have insufficient semantic information correlation between model structure and image features.Aiming at these problems,an image captioning method combining with gated recurrent unit and convolutional block attention module based on multimodal recurrent neural network is proposed.In order to verify the validity of the method,a comparative analysis of the experiments was performed on the MSCOCO2014 dataset.The experimental results show that this method has better performance than the original method and other classical methods under various evaluation criteria,also can better handle key information in images and generate more accurate image description sentences.
作者
李柯徵
王海涌
Li Kezheng;Wang Haiyong(School of Electronic and Information Engineering,Lanzhou Jiaotong University,Lanzhou 730070,Gansu,China;Gansu Provincial Engineering Research Center for Artificial Intelligence and Graphics&Image Processing,Lanzhou Jiaotong University,Lanzhou 730070,Gansu,China)
出处
《计算机应用与软件》
北大核心
2021年第9期153-159,共7页
Computer Applications and Software
基金
国家自然科学基金项目(51868042)
兰州交通大学“百名青年优秀人才培养计划”基金项目(2018103)。
关键词
图像描述
多模态
门控循环单元
注意力机制
神经网络
Image captioning
Multimodal
Gated recurrent unit
Attention mechanism
Neural network