
基于声道长度对齐的年龄语音转换 被引量:2

Vocal tract length aligning based mandarin age voice conversion
摘要 提出一种基于声道长度对齐的年龄语音转换方法.该方法包含频谱转换和基频转换两个方面,前者在频域依据声道因子和弯折函数对已进行基音标注过的每一帧语音的频谱进行弯折转换;后者对基频特征的转换采用线性变换方法.实验结果表明,通过对同一人不同年龄段的语音进行转换合成,由年龄较大语音向年龄较小语音转换时,转换合成得到的语音频谱平均距离得到明显减小,转换效果较好,而从年龄较小语音向年龄较大语音转换时,频谱平均距离减少较小,同时女性年龄语音转换的效果和自然度都好于男性. Vocal tract length aligning was proposed for mandarin age voice conversion which transforms age speech into some required target age speech .In the method ,the speech spectrum which has been pitch marked was warped in the frequency domain based on the warping factor and warping function while pitch was converted by linear transformation .The experimental results show that the effect of transforming old age speech into a young one is better than otherwise and that the average spectra distance of the former is markedly reduced .Meanwhile , age voice conversion is better for female voice than for male voice in effectiveness and naturalness .
出处 《中国科学技术大学学报》 CAS CSCD 北大核心 2015年第7期575-581,共7页 JUSTC
基金 国家自然科学基金(61472393) 安徽省自主创新专项基金(13Z02008)资助
关键词 年龄语音转换 声道长度对齐 基音标注 声道因子 弯折函数 线性变换 age voice conversion vocal tract length aligning pitch marker warping factor warping function linear transformation
  • 相关文献


  • 1Tiirk O, Arslan L M. Subband based voice conversion[C]// International Conference on Spoken LanguageProcessing. Denver,USA: IEEE Press, 2002:289-292. 被引量:1
  • 2Tao S B, J H, Kang Y G, Li A J. Prosody conversionfrom neutral speech to emotional speech [J]. IEEETransactions on Audio, Speech, and LanguageProcessing, 2006,14(4): 1145-1154. 被引量:1
  • 3Wu C H, Hsia C C,Lee C H, et al. Hierarchicalprosody conversion using regression-based clustering foremotional speech synthesis [J]. IEEE Transactions onAudio, Speech, and Language Processing, 2010,18(6): 1394-1405. 被引量:1
  • 4Schotz S. Perception, analysis and synthesis of speakerage[R], Lund University, 2006. 被引量:1
  • 5Tiirk O New methods for voice conversion[D]. MasterDegree, Yliksek Lisans Tezi. Istanbul : BogazigiUniversitesi, 2003. 被引量:1
  • 6Toda T,Black A W,Tokuda K. Voice conversionbased on maximum-likelihood estimation of spectralparameter trajectory[J]. IEEE Transactions on Audio,Speech, and Language Processing, 2007,15 ( 8 ):2222-2235. 被引量:1
  • 7Mashimo M, Toda T,Shikano K,et al. Evaluation ofcross-language voice conversion based on GMM andSTRAIGHT[C]// 7th European Conference on SpeechCommunication and Technology. Aalborg,Denmark:ISCA Press, 2001: 361-364. 被引量:1
  • 8Kain A,Macon M W. Spectral voice conversion fortext-to-speech synthesis [C]// Proceedings of theInternational Conference on Acoustics, Speech andSignal Processing. Seattle, USA; IEEE Press, 1998,1: 285-288. 被引量:1
  • 9Zeng D J,Yu Y B. Voice conversion using structuredGaussian mixture model [C]// International Conferenceon Signal Processing. Beijing, China: IEEE Press,2010: 541-544. 被引量:1
  • 10Zhang M, Tao J H. Phoneme cluster based statedmapping for text-independent voice conversion CC]//International Conference on Acoustics,Speech, andSignal Processing. Taipei, China: IEEE Press, 2009:4281-4284. 被引量:1


  • 1Uebel L F, Woodland P C. An investigation into vocal tract length normalization[C]. In: Proc. Eurospeech, 1999:2527-2530 被引量:1
  • 2Wakita H. Normalization of vowels by vocal-tract length and its application to vowel identification[J]. IEEE Trans. On Acoustic, Speech and Signal Processing, 1977, 25:183-192. 被引量:1
  • 3Claes T, Dologlou I, Bosch L T, et al. A novel feature transformation for vocal tract length normalization in automatic speech recognition[J]. IEEE Trans. on Speech and Audio Processing, 1998, 6(6): 549-557. 被引量:1
  • 4Eide E, Gish H. A parametric approach to vocal tract length normalization[C]. In: Proc. ICASSP, 1996:346-348. 被引量:1
  • 5Li L, Richard R. Speaker normalization using efficient warping procedures[C]. In: Proc. ICASSP, 1996:353-356. 被引量:1
  • 6Li L, Richard R. A frequency warping approach to speaker normalization[J]. IEEE Trans. on Speech and Audio Processing,1998, 6(1): 49-60. 被引量:1
  • 7Ono Y, Wakita H, Zhao Y-X. Speaker normalization using constrained spectra shifts in auditory filter domain[C]. In: Proc. Eurospeech, 1993:355-358. 被引量:1
  • 8Rabiner L R, Schafer R W. Digital processing of speech signals[M]. Prentice-Hall Press, 1978. 被引量:1












使用帮助 返回顶部