摘要
唇读,也称视觉语言识别,旨在通过说话者嘴唇运动的视觉信息,解码出其所说文本内容.唇读是计算机视觉和模式识别领域的一个重要问题,在公共安防、医疗、国防军事和影视娱乐等领域有着广泛的应用价值.近年来,深度学习技术极大地推动了唇读研究进展.本文首先阐述了唇读研究的内容和意义,并深入剖析了唇读研究面临的难点与挑战;然后介绍了目前唇读研究的现状与发展水平,对近期主流唇读方法进行了梳理、归类和评述,包括传统方法和近期的基于深度学习的方法;最后,探讨唇读研究潜在的问题和可能的研究方向.以期引起大家对唇读问题的关注与兴趣,并推动与此相关问题的研究进展.
Lip reading, also known as visual speech recognition, aims to infer the content of a speech through the motion of the speaker′s mouth. Lip reading is an important issue in the field of computer vision and pattern recognition. It has a wide range of applications in the fields of public security, medical, defense military and professional filming. In recent years, deep learning technology has greatly promoted the progress of lip reading research. Starting from the definition of lip reading problem, this paper first expounds the content and significance of lip reading research, and deeply analyzes the difficulties and challenges of lip reading research. Then, the recent achievements of lip reading research are introduced, and the current mainstream lip reading methods are combed, categorized and reviewed as well, including traditional methods and recent methods based on deep learning. Finally, the potential problems and possible research directions of lip reading research are discussed to arouse the attention and interest of this research, and promote the research progress of related issues.
作者
陈小鼎
盛常冲
匡纲要
刘丽
CHEN Xiao-Ding;SHENG Chang-Chong;KUANG Gang-Yao;LIU Li(College of Electronic Science,National University of Defense Technology,Changsha 410073;College of Systems Engineering,National University of Defense Technology,Changsha 410073)
出处
《自动化学报》
EI
CSCD
北大核心
2020年第11期2275-2301,共27页
Acta Automatica Sinica
基金
国家自然科学基金(61872379)资助。
关键词
唇读
视觉语言识别
时空特征提取
计算机视觉
深度学习
Lip reading
visual speech recognition
spatiotemporal feature extraction
computer vision
deep learning