基于唇重构与三维耦合CNN的多视角音唇一致性判别

Multi-View Lip Motion and Voice Consistency Judgment Based on Lip Reconstruction and Three-Dimensional Coupled CNN

下载PDF

导出

摘要针对传统音唇一致性判别方法主要对正面唇动视频进行处理,未考虑视频采集角度变化对结果的影响,且容易忽略唇动过程中的时空特性等不足,文中以唇部角度变化对一致性判别的影响为研究重心,结合三维卷积神经网络在非线性表示和时空维度特征提取上的优势,提出了基于正面唇重构与三维耦合卷积神经网络的多视角音唇一致性判别方法。该方法先通过在生成器中引入自映射损失来提高正面重建效果,并采用基于自映射监督循环一致性生成对抗网络(SMS-CycleGAN)的唇重构方法对多视角唇图进行角度分类及正面重构;然后设计两个异构三维卷积神经网络,分别用来描述音频和视频信号,并提取包含长时时空关联信息的三维卷积特征;最后引入对比损失函数作为音视频信号匹配的相关度鉴别度量,将音视频网络输出耦合到同一表示空间,并进行一致性判别。实验结果表明,文中方法能重建出更高质量的正面唇图,一致性判别性能优于多种不同类型的比较方法。 The traditional consistency judgment methods of lip motion and voice mainly focus on processing the frontal lip motion video,without considering the impact of angle changes on the result during the video acquisition process.In addition,they are prone to ignoring the spatio-temporal characteristics of the lip movement process.Aiming at these problems,this paper focused on the influence of lip angle changes on consistency judgment,combined the advantages of three dimensional convolutional neural networks for non-linear representation and spatio-temporal dimensional feature extraction,and proposed a multi-view lip motion and voice consistency judgment method based on frontal lip reconstruction and three dimensional(3D)coupled convolutional neural network.Firstly,the self-mapping loss was introduced into the generator to improve the effect of frontal reconstruction,and then the lip reconstruction method based on self-mapping supervised cycle-consistent generative adversarial network(SMS-CycleGAN)was used for angle classification and frontal reconstruction of multi-view lip image.Secondly,two heterogeneous three dimensional convolution neural networks were designed to describe the audio and video signals respectively,and then the 3D convolution features containing long-term spatio-temporal correlation information were extracted.Finally,the contrastive loss function was introduced as the correlation discrimination measure of audio and video signal matching,and the output of the audio-video network was coupled into the same representation space for consistency judgment.The experimental results show that the method proposed in this paper can reconstruct frontal lip images of higher quality,and it is better than a variety of comparison methods on the performance of consistency judgment.

作者朱铮宇罗超贺前华彭炜锋毛志炜张顺四 ZHU Zhengyu;LUO Chao;HE Qianhua;PENG Weifeng;MAO Zhiwei;ZHANG Shunsi(Audio,Speech and Vision Processing Laboratory,South China University of Technology,Guangzhou 510640,Guangdong,China;School of Cyber Security,Guangdong Polytechnic Normal University,Guangzhou 510665,Guangdong,China;Guangzhou Quwan Network Technology Co.,Ltd.,Guangzhou 510665,Guangdong,China)

机构地区华南理工大学音频、语音与视觉处理实验室广东技术师范大学网络空间安全学院广州趣丸网络科技有限公司

出处《华南理工大学学报（自然科学版）》 EI CAS CSCD 北大核心 2023年第5期70-77,共8页 Journal of South China University of Technology(Natural Science Edition)

基金国家自然科学基金资助项目(61672173) 国家重点研发计划项目(2018YFB1802200)。

关键词一致性判别生成对抗网络卷积神经网络正面重构多模态 consistency judgment generative adversarial network convolutional neural network frontal recon-struction multi-modal

分类号 TP391 [自动化与计算机技术—计算机应用技术]

引文网络
相关文献

参考文献3

1朱铮宇,贺前华,奉小慧,叶婉玲,李艳雄,杨继臣.基于时空相关度融合的语音唇动一致性检测算法[J].电子学报,2014,42(4):779-785. 被引量：5
2贺前华,朱铮宇,奉小慧.基于平移不变字典的语音唇动一致性判决方法[J].华中科技大学学报（自然科学版）,2015,43(10):69-74. 被引量：3
3张瑞峰,白金桐,关欣,李锵.结合SE与BiSRU的Unet的音乐源分离方法[J].华南理工大学学报（自然科学版）,2021,49(11):106-115. 被引量：5

二级参考文献32

1MI Faraj, J Bigun. S ynergy of lip-motion and acoustic features in biometric speech and speaker recognition[ J]. IEEE Transac- tions on Computer,2007,56(9): 1169- 1175. 被引量：1
2S Kumagal, K Doman, et al. Detection of inconsistency between subject and speaker based on the co-occurrence of lip motion and voice towards speech scene extraction from news videos [ A]. IEEE International Symposium on Multimedia[ C]. Cali- fornia: IEEE,2011.311 - 318. 被引量：1
3M Slaney,M Covell. Facesync:A linear operator for measuring synchronization of video facial images and audio track [ A ].Neural Information Processing Systems [ C ]. Denver: NIPSF, 2000. 814 - 820. 被引量：1
4N Eveno, L Besacier. A speaker independent "liveness" test for audio-visual biomelrics [ A ]. Nineth European Conference on Speech Communication and Technology [ C ]. Lisbon: ISCA, 2005. 3081 - 3084. 被引量：1
5G ChoUet, R Landais, et al. Some experiments in audio-visual speech processing [A ]. Non-Linear Speech Processing 2007 [ C]. Paris-ISCA, 2007.28 - 56. 被引量：1
6A Sayo, Y Kajikawa, et al. Biometrics authentication method using lip motion in utterance[ A]. 8th International Conference on Information, Communications and Signal Processing [ C ]. Singapore: IEEF., 2011.1 - 5. 被引量：1
7AA EL-Sallam, AS Mian. Correlation based speech-video syn- chronization[ J]. Pattern Recognition Letters, 2011,32 ( 6 ) : 780 - 786. 被引量：1
8B Goswami, C Chan, et al. Speaker authentication using video- based lip information[ A]. IEEE, International Conference on A- coustics, Speech, and Signal Processing [ C ]. Prague: IEEE, 2011.1908 - 1910. 被引量：1
9R Goecke, B MiUar. Statistical analysis of the relationship be- tween audio and video speech parameters for Australian Eng- lish[ A]. Auditory Visual Speech Processing Conference[ C]. France: ISCA,2003.133 - 138. 被引量：1
10ME Sargin, Y Yemez, et al. Audiovisual synchronization and fusion using canonical correlation analysis[ J]. IEEE Transac- tions on Multimedia,2007,9(7) : 1396 - 1402. 被引量：1

共引文献8

1贺前华,潘伟锵,胡永健,朱铮宇,李艳雄,奉小慧.说话人认证录音回放检测方法综述[J].数据采集与处理,2015,30(2):266-274. 被引量：1
2LUO Siwei,HOU Mengshu,ZHAN Siyu,LYU Mengjie,LI Ming.Consistency Maintenance in Replication：A Novel Strategy Based on Diamond Topology in Cloud Storage[J].Chinese Journal of Electronics,2017,26(1):192-198.
3朱铮宇,邱华愉,杨春玲,王泳.基于特定韵母发音事件分析的语音唇动一致性判决方法[J].华南理工大学学报（自然科学版）,2020,48(1):139-146. 被引量：4
4朱铮宇,廖丽平,杨春玲,王泳,蔡君,邱华愉.基于韵母发音事件匹配与位置时延分析的音唇一致性判决方法[J].电子学报,2021,49(1):140-148. 被引量：1
5范静.考虑音符序列的钢琴演奏和弦指法自动标注算法[J].常州工学院学报,2022,35(5):39-45. 被引量：1
6严洪立,黄林南,高跃明,Željka Lučev VASIĆ,Mario CIFREK.表面肌阻抗混合信号的盲源分离电特性提取方法[J].华南理工大学学报（自然科学版）,2022,50(12):142-150.
7孟晶晶,徐雅斌.基于集成学习的乐声分离方法[J].北京信息科技大学学报（自然科学版）,2023,38(3):27-34.
8向进,陈爱斌,彭伟雄,温治芳.基于时频复值特征的多尺度扩张DenseNet条件源分离网络[J].郑州大学学报（理学版）,2023,55(5):60-66.

1方雁离.被石块砸碎的玻璃[J].山花,2021(2):72-82.
2陈国辉,常群.深入理解中国式现代化的本质内涵[J].当代贵州,2023(16):72-72.
3涂泰.谈小学低年级语文教学中发挥学生主体作用的意义及策略[J].中文科技期刊数据库（引文版）教育科学,2020(11):111-111.
4邓丽君,王爽.纪念叶剑英同志诞辰125周年叶剑英“为花欣作落泥红”[J].中华英才,2022(9):18-19.
5高阳.高中体育教学中体能训练的有效途径研究[J].中国科技经济新闻数据库教育,2020(12):138-138.
6刘晓鸥,刘剑,李学斌,赵号,刘建伟.面向低碳的多时间尺度虚拟电厂优化调度策略研究[J].中国勘察设计,2023(S01):69-74.
7王丽敏,石淑娜.经管类专业统计学课程思政实施策略研究[J].牡丹江教育学院学报,2023(3):92-95. 被引量：2
8张琴.现代护理方法应用于分泌性中耳炎围手术期听力护理的效果[J].中文科技期刊数据库（引文版）医药卫生,2022(3):141-144. 被引量：1
9李娟.爱是双行道[J].特别关注,2022(3):63-63.
10宋志平,李华,段加全,郭超,宫艳峰.汽油机进气歧管一维三维耦合仿真优化[J].轻型汽车技术,2023(3):3-8.

华南理工大学学报（自然科学版）

2023年第5期

浏览历史

内容加载中请稍等...

基于唇重构与三维耦合CNN的多视角音唇一致性判别

参考文献3

二级参考文献32

共引文献8

相关作者

相关机构

相关主题

浏览历史