期刊文献+

短时频谱通用背景模型群联合韵律的年龄语音转换 被引量:3

Voice conversion of different ages using universal background model groups of short-time spectra and prosodic features
下载PDF
导出
摘要 提出一种短时频谱通用背景模型群与韵律参数相结合进行年龄语音转换的方法。谱参数转换方面,同一年龄段各说话者提取语音短时谱系数并建立高斯混合模型,然后依据语音特征相似性对说话者进行聚类,每一类训练一个通用背景模型,最终得到通用背景模型群和一组短时频谱转换函数。谱参数转换之后再对共振峰进一步微调。韵律参数转换方面,基频和语速分别建立单高斯和平均时长率模型来推导转换函数。实验结果显示,提出的方法在ABX和MOS等评价指标上比传统的双线性法有明显的优势,相对单一通用背景模型法的对数似然度变化率提高了4%。这一结果表明提出的方法能够使转换语音具有良好目标倾向性的同时有较好的语音质量,性能较传统方法有明显提升。 For the voice conversion of different ages, a method using Universal Background Model Groups(UBMG) of short-time spectra and prosodic features is proposed. In spectrum aspect, Gaussian Mixture Model(GMM) is trained for every speaker after extracting linear predictive cepstrum coefficients, then the speakers in the same age period are clustered based on their voice similarity, and each cluster is further trained to be a UBM of spectrum distribution.Finally, an UBM group and corresponding spectrum conversion functions are obtained in each age period. Formants adjustment is further used after spectrum conversion. Furthermore, fundamental frequency and speech rate are modeled by single Gaussian and average duration rate respectively to derive their conversion functions in the aspect of prosodic features. The results of objective and subjective evaluation experiments such as ABX and MOS show that the proposed method has a distinct advantage compared with conventional bilinear method and its change rate of log-likelihood ratio increases by 4% compared with single UBM method. The results show the proposed method can make the converted speech more close to the speech of target age period with good speech quality while the performance has been improved evidently compared with conventional methods.
作者 惠琳 俞一彪 HUI Lin YU Yibiao(School of Electronic and Information Engineering, Soochow University Suzhou 215006)
出处 《声学学报》 EI CSCD 北大核心 2017年第6期762-768,共7页 Acta Acustica
基金 国家自然科学基金项目(61271360)资助
  • 相关文献

参考文献4

二级参考文献78

  • 1卢正鼎,丰洪才.基于分段线性频谱弯折函数的说话人归一化方法[J].小型微型计算机系统,2004,25(12):2232-2236. 被引量:2
  • 2康永国,双志伟,陶建华,张维.基于混合映射模型的语音转换算法研究[J].声学学报,2006,31(6):555-562. 被引量:13
  • 3罗武庭.DJ—2可变矩形电子束曝光机的DMA驱动程序[J].LSI制造与测试,1989,10(4):20-26. 被引量:373
  • 4H.Fujisaki and K.Hirose,“Analysis of voice fundamental frequency contours for declarative sentence of Japanese,”J.Acoust.Soc.Japan,1984,5(4):233-242. 被引量:1
  • 5G.P.Kochanski and C.Shih,“STEM-ML:Language independent prosody description,”Proc.ICSLP,Beijing, China,2000:239-242. 被引量:1
  • 6Y.Xu and Q.E.Wang,“Pitch targets and their realization: Evidence from Mandarin Chinese,”Speech Communication, 2001,33:319-337. 被引量:1
  • 7X.Sun,“The determination,analysis,and synthesis of fundamental frequency,”Ph.D.dissertation,Northwestern Univ.,2002. 被引量:1
  • 8S.Desai,E.V.Raghavendra,B.Yegnanarayana,A.W. Black,and K.Prahallad,“Voice conversion using artificial neural networks,”IEEE Int.Conf.on Acoustics, Speech and Signal Processing(ICASSP),2009:3893- 3896. 被引量:1
  • 9Y.Stylianou,T.Toda,C.H.Wu,A.Kain,and O. Rosec,“Introduction to the Special Section on Voice Transformation,”IEEE Audio,Speech,and Language Processing,2010,18(5):909-911. 被引量:1
  • 10Y.Zhang and J.Tao,“Prosody Modification on Mixed-Language Speech Synthesis,”Chinese Spoken Language Processing,2008:l-4. 被引量:1

共引文献11

同被引文献19

引证文献3

二级引证文献4

相关作者

内容加载中请稍等...

相关机构

内容加载中请稍等...

相关主题

内容加载中请稍等...

浏览历史

内容加载中请稍等...
;
使用帮助 返回顶部