期刊文献+

音频驱动跨模态视觉生成算法综述 被引量:1

Literature review of audio-driven cross-modal visual generation algorithms
下载PDF
导出
摘要 由于音频驱动的跨模态视觉生成算法具有广泛地应用场景,近年来已得到产业界和科研界的广泛关注。音频和视觉为人们日常生活中最重要和常见的2种模态,然而设计一种能够创意地想象出与音频相对应的视觉场景一直是一个巨大挑战,目前关于音频驱动的跨模态视觉生成问题在已有文献中尚未得到系统而全面地研究。针对现有音频驱动的跨模态视觉生成算法进行概述,并将其分为音频到图像、音频到肢体动作视频和音频到说话人脸视频3类。首先阐述其具体应用领域与主流算法流程,并对涉及框架技术进行解析,然后按照技术推进的顺序对相关算法的核心内容与优劣势进行阐述,并解释其生成表现效果,最后对目前领域内所面临的机遇和挑战进行讨论,给出未来研究方向。 Audio driven cross-modal visual generation algorithms have been widely employed in many fields, and have gained attention from industry and academia in recent years. Audio and vision are the most important and common modalities in people’s daily life. However, it has been a great challenge to creatively generate a visual scene corresponding to the audio. The existing literature has not systematically and comprehensively studied the topic of audio driven cross-modal visual generation. This paper summarized the existing algorithms for audio-driven cross-modal visual generation and divided them into three categories: audio to image, audio to body motion video, and audio to talking face video. For each category, we first described the fields of its specific applications and processes of mainstream algorithms, and analyzed the framework technologies involved. Then the core contents, advantages, and disadvantages of related algorithms were described according to the order of technology advancement, and their generation and performance effects were explained. Finally, the opportunities and challenges in the current field were discussed and the future research suggestions were provided.
作者 姜莱 于震 王鹏飞 周东生 侯亚庆 JIANG Lai;YU Zhen;WANG Peng-fei;ZHOU Dong-sheng;HOU Ya-qing(Conservatory of Music,Guangdong Polytechnic Normal University,Guangzhou Guangdong 510665,China;School of Computer Science and Technology,Dalian University of Technology,Dalian Liaoning 116024,China;School of Software,Dalian University,Dalian Liaoning 116622,China)
出处 《图学学报》 CSCD 北大核心 2022年第2期181-188,共8页 Journal of Graphics
基金 国家自然科学基金委-辽宁联合基金项目(U1908214) 中央高校基本科研基金项目(DUT21TD107,DUT20RC(3)039) 辽宁省兴辽人才计划项目(XLYC2008017) 辽宁省重点研发计划项目(2019JH2/10100030) CCF-腾讯犀牛鸟基金项目(IAGR20210116)。
关键词 跨模态生成 音频 视觉 深度学习 综述 cross-modal generation audio vision deep learning review
  • 相关文献

参考文献1

二级参考文献2

共引文献2

同被引文献8

引证文献1

相关作者

内容加载中请稍等...

相关机构

内容加载中请稍等...

相关主题

内容加载中请稍等...

浏览历史

内容加载中请稍等...
;
使用帮助 返回顶部