摘要
本研究主要目的是利用图形图像学技术,通过深度学习模型,对真人视频进行图像的数字化模拟,通过TTS技术来驱动合成数字人图像视频。通过对4K视频源内人形图像进行面部与身体动作要素的学习,从而达到利用AI的学习能力来模仿真人进行口型与声音波纹的匹配,最终合成视频。
The main purpose of this study is to use graphic imaging technology to digitally simulate human videos through deep learning models,and to drive the synthesis of digital human image videos through TTS technology.By learning the elements of face and body movements of human images in 4k video sources,the learning ability of AI is used to imitate the mouth shape and voice ripple of real people,ultimately synthesizing the video research process.
作者
杨继红
姜华
刘银颢
Yang Jihong;Jiang Hua;Liu Yinhao(Audio-visual New Media Center,China Media Group,Beijing 100020,China)
出处
《广播与电视技术》
2023年第7期16-24,共9页
Radio & TV Broadcast Engineering