摘要
作为计算机视觉?多媒体?人工智能和自然语言处理等领域的交叉性研究课题,视觉场景描述的研究内容是自动生成一个或多个语句用于描述图像或视频中呈现的视觉场景信息.视觉场景中内容的丰富性和自然语言表达的多样性使得视觉场景描述成为一项充满挑战的任务,综述了现有视觉场景描述方法及其效果评价.首先,论述了视觉场景描述的定义?研究任务及方法分类,简要分析了视觉场景描述与多模态检索、跨模态学习、场景分类、视觉关系检测等相关技术的关系;然后分类讨论视觉场景描述的主要方法?模型及研究进展,归纳日渐增多的基准数据集;接下来,梳理客观评价视觉场景描述效果的主要指标和视觉场景描述技术面临的问题与挑战,最后讨论未来的应用前景.
As a cross-domain research topic related to Computer Vision, Multimedia, Artificial Intelligence and Natural Language Processing, the task of visual scene description is to produce automatically one or more sentences to describe the content of visual scene from an image or a video snippet. The richness of the content in the visual scene and the diversity of the expression of natural language make visual scene description a challenging task. This paper gives a review about the generation methods and performance evaluation on the recently developed visual scene description methods. Specifically, the research object and main tasks of visual scene description are firstly defined;the relationships between visual scene description and multi-modal retrieval, cross-modal learning, scene classification, visual relationship detection and other related technologies are discussed sequentially. And then, main methods and research progress of visual scene description are summarized in three categories, while the increasing benchmark datasets are discussed. Besides, some widely-used evaluation metrics and the corresponding challenges on the visual scene description are discussed. Finally, some potential applications in future are suggested.
作者
马苗
王伯龙
吴琦
武杰
郭敏
MA Miao;WANG Bo-Long;WU Qi;WU Jie;GUO Min(Key Laboratory of Modern Teaching Technology of Ministry of Education (Shaanxi Normal University), Xi’an 710062, China;School of Computer Science, Shaanxi Normal University, Xi’an 710119, China)
出处
《软件学报》
EI
CSCD
北大核心
2019年第4期867-883,共17页
Journal of Software
基金
国家自然科学基金(61877038
61801282
61601274)
陕西省自然科学基金(2018JM6068)
中央高校基本科研业务经费(GK201703054
GK201703058)~~
关键词
深度学习
图像描述
视频描述
基准数据集
性能评价
deep learning
image captioning
video captioning
benchmark dataset
performance evaluation