摘要
现有视觉故事生成方法没有考虑不同图像之间高层视觉特征和语义关系特征的深层关联、忽视对图像序列主题的挖掘。提出一种既考虑了整个图像序列的主题又考虑不同图像之间视觉特征和语义关系特征相关性的融合视觉特征和语义关系特征的视觉故事生成方法。该方法利用自编码器挖掘图像序列的主题特征,通过主题特征总结由图像中实体推理出的语义关系特征,然后利用相互注意力融合高层视觉特征和语义关系特征。实验表明,该方法可以生成更具一致性和表现力的故事,并在机器评价指标上取得了比现有方法更好的结果。
Aiming at the defects in existing methods of visual storytelling which do not consider the deep correlation between high-level visual features and semantic features between different images,meanwhile also ignore exploring the topics of the image sequences,a method of visual storytelling combining visual features and semantic features which considers both the topics of the whole image sequences and the correlation of visual features and semantic features between different images is proposed.The method digs out the theme features of the image sequences with an autoencoder,and the semantic features inferred from the entities in the images are summarized through the theme features,then a mutual attention is used to fuse high-level visual features and semantic features.Experiments show that the method can generate more consistent and expressive stories and achieve better results on machine evaluation index than existing methods.
作者
吴佩伦
蒋勇
高琳
WU Peilun;JIANG Yong;GAO Lin(School of Computer Science and Technology,Southwest University of Science and Technology,Mianyang 621010,Sichuan,China;School of Electronic Engineering,Chengdu University of Information Technology,Chengdu 610225,Sichuan,China)
出处
《西南科技大学学报》
CAS
2022年第3期44-51,共8页
Journal of Southwest University of Science and Technology
基金
四川省科技计划项目(2020YFS0316)。