期刊文献+

基于GMM和ANN混合模型的语音转换方法 被引量:1

Voice Conversion Based on Mixed GMM-ANN Model
下载PDF
导出
摘要 为了克服利用高斯混合模型(Gaussian mixture model,GMM)进行语音转换的过程中出现的过平滑现象,考虑到GMM模型参数的均值能够表征转换特征的频谱包络形状,提出一种基于GMM与人工神经网络(Artificial neural network,ANN)混合模型的语音转换。该方法利用ANN对GMM模型参数的均值进行转换;为了获取连续的转换频谱,采用静态和动态频谱特征相结合来逼近转换频谱序列;鉴于基频对语音转换的重要性,在频谱转换的基础上,对基频也进行了分析和转换。最后,通过主观和客观实验对提出的混合模型的语音转换方法的性能进行测试。实验结果表明,与传统的基于GMM模型的语音转换方法相比,本文提出的方法能够获得更好的转换语音。 As the mean vector of Gaussian mixture model (GMM) parameters can represent the basic shapes of converted feature vectors, based on a mixed model comprised of GMM and arti- ficial neural network (ANN), a novel spectral conversion method is proposed. The method al- leviates the over-smoothing problem by using ANN to transform the mean vector of GMM pa- rameters. Static and dynamic spectral features are used for approaching the converted spectrum sequence in order to gain the continuous converted spectral. Moreover, as pitch is very impor- tant to voice conversion, it is also analyzed and transformed on the basis of spectral conversion. The performance of the proposed method is evaluated using subjective and objective tests, and the results show that the proposed method can obtain a better speech quality than the earlier voice conversion system based on conventional GMM method.
出处 《数据采集与处理》 CSCD 北大核心 2014年第2期227-231,共5页 Journal of Data Acquisition and Processing
基金 江苏省高校自然科学研究(13KJA510003)重大资助项目 江苏高校优势学科建设工程(PAPD)资助项目 江苏省普通高校研究生科研创新计划(CXLX12_0478 CXZZ13_0488)资助项目
关键词 频谱转换 高斯混合模型 径向基函数 神经网络 spectral conversion Gaussian mixture model radial basis function neural network
  • 相关文献

参考文献15

  • 1孙健,张雄伟,曹铁勇,杨吉斌,孙新建.基于卷积非负矩阵分解的语音转换方法[J].数据采集与处理,2013,28(2):141-148. 被引量:12
  • 2Abe M, Nakamura S, Shikano K, et al. Voice con- version through vector quantization[C]// 1EEE In- ternational Conference on Acoustics, Speech and Sig- nal Processing. New York, USA: IEEE, 1988:655- 658. 被引量:1
  • 3Stylianou Y, Cappe O, Moulines E. Continuous probabilistic transform for voice conversion[J]. IEEE Transactions on Speech and Audio Processing, 1998, 6(2) :131-142. 被引量:1
  • 4Kain A, Macon M W. Spectral voice conversion for text-to-speech synthesis [ C ]//IEEE International Conference on Acoustics, Speech and Signal Process- ing. Seattler, WA, USA: IEEE, 1998:285-288. 被引量:1
  • 5Laskar R H, Chakrabarty D, Talukdar F A, et al. Comparing ANN and GMM in a voice conversion framework[J]. Applied Soft Computing, 2012, 12 (11) :3332-3342. 被引量:1
  • 6岳振军,邹翔,王浩.基于隐马尔可夫模型和高斯混合模型结合的声音转换方法[J].数据采集与处理,2009,24(3):285-289. 被引量:5
  • 7Desai S, Black A W, Yegnanarayana B, et al. Spec- tral mapping using artificial neural networks for voice conversion[J]. IEEE Transactions on Audio, Speech and Language Processing, 2010,18(5) : 954-964. 被引量:1
  • 8Rao K S. Voice conversion by mapping the speaker-specific features using pitch synchronous approach [J]. Computer Speech and Language, 2010,24 (3) : 474-494. 被引量:1
  • 9Chen Yining, Chu Min, Chang Eric, et al. Voice conversion with smoothed GMM and MAP adaptation [C]//8th European Conference on Speech Communi- cation and Technology. Geneva, Switzerland: ISCA Archive, 2003:2413-2416. 被引量:1
  • 10Toda T. Black A W, Tokuda K. Voice conversion based on maximum-likelihood estimation of spectral parameter trajectory[J]. IEEE Transactions on Au- dio, Speech and Language Processing, 2007, 15 (8) : 2222-2235. 被引量:1

二级参考文献26

  • 1岳振军,王浩,张雄伟.基于正弦谐波模型和BP神经网络的语音变换算法及实现[J].信号处理,2005,21(z1):208-211. 被引量:7
  • 2左国玉,刘文举,阮晓钢.声音转换技术的研究与进展[J].电子学报,2004,32(7):1165-1172. 被引量:32
  • 3双志伟,张世磊,秦勇.语音转换分析及相似度改进[J].清华大学学报(自然科学版),2009(S1):1408-1412. 被引量:3
  • 4李波,王成友,蔡宣平,张尔扬.LPC与LSF转换算法的比较研究[J].信号处理,2004,20(5):521-524. 被引量:1
  • 5Abe M,Nakamura S,Shikano K,et al. Voice conversion through vector quantization[C]//Proceedings of IEEE International Conference on Acoustics, Speech and Signal Processing. New York:IEEE,1988..655- 658. 被引量:1
  • 6Stylianou Y,Cappe O, Moulines E. Continuos probabilistic transform for voice conversion[J]. IEEE Speech and Audio Processing, 1998, 6(20): 131- 142. 被引量:1
  • 7Lee Ki-Seung. Statistical approach for voice personality transformation [J]. IEEE Transactions on Au- dio, Speech and Language Processing, 2007, 15 (2) :641-651. 被引量:1
  • 8Chu M, Lin H, Jie P X,et al. Voice conversion between female and male in a TD-PSOLA based Chi- nese TTS system[C]//Proceedings of the 5th International Conference on Spoken Language Processing. Singapore:[s. n.],1998,26:113-117. 被引量:1
  • 9宋巍.基于支持向量回归的说话人变换技术[D].南京:南京理工大学通信工程学院,2007. 被引量:1
  • 10Stylianou Y. Voice transformation: a survey [C]// IEEE International Conference on Acoustics, Speech and Signal Processing. China: IEEE, 2009: 3585- 3588. 被引量:1

共引文献13

同被引文献16

  • 1Lee K S. A unit selection approach for voice transformation[J]. Speech Communication, 2014, 60: 30-43. 被引量:1
  • 2Xu Ning, Tang Yibing, Bao Jingyi, et al. Voice conversion based on Gaussian process by coherent and asymmetric training with limited training data[J]. Speech Communication, 2014, 58: 124-138. 被引量:1
  • 3Abe M, Nakamura S, Shikano K, et al. Voice conversion through vector quantization[C]//Proc IEEE International Confer- ence on Acoustics, Speech and Signal Processing. New Jersey: IEEE Press, 1988: 655-658. 被引量:1
  • 4Shikano K, Nakamura S, Abe M. Speaker adaptation and voice conversion by codebook mapping[C]//IEEE International Symposium on Circuits and Systems. New Jersey: IEEE Press, 1991: 594-597. 被引量:1
  • 5Stylianou Y, Cappa O, Moulines E. Continuous prohabilistic transform for voice converslon[J]. IEEE Transactions on Speech and Audio Processing, 1998, 6(2): 131- 142. 被引量:1
  • 6Kain A, Macon M W. Spectral voice conversion for text-to-speech synthesis[C]//IEEE International Conference on Acous- tics, Speech and Signal Processing. New Jersey: IEEE Press, 1998: 285-288. 被引量:1
  • 7Desai S, Black A, Yegnanarayana B, et al. Spectral mapping using artificial neural networks for voice conversion[J]. IEEE Transanetions on Audio, Speech and Language Processing, 2010, 18(5) : 954-964. 被引量:1
  • 8Narendranath M, Murthy H A, Rajendran S, et al. Transformation of formants for voice conversion using artificial neural networks[J]. Speech Communication, 1995, 16: 207-216. 被引量:1
  • 9Niros A D, Tsekouras G E. A novel training algorithm for RBF neural network using a hybrid fuzzy clustering approach[J]. Fuzzy Sets and Systems, 2012, 193: 62-84. 被引量:1
  • 10Ye H, Young S. Quality-enhanced voice morphing using maximum likelihood transformations[J]. IEEE Transactions on Audio,Speech and Language Processing, 2006, 14(4): 1301-1312. 被引量:1

引证文献1

二级引证文献8

相关作者

内容加载中请稍等...

相关机构

内容加载中请稍等...

相关主题

内容加载中请稍等...

浏览历史

内容加载中请稍等...
;
使用帮助 返回顶部