期刊文献+

基于RBF神经网络的源——目标话音转换

Voice conversion from source speaker to target speaker based on RBF neural network
下载PDF
导出
摘要 源-目标说话人声音转换是一种变换说话人声音特征的技术,它将源说话人的声音转换成目标说话人的声音。本文选择声道共振峰参数作为待转换的特征参数,为了克服线性多变量回归转换方法(LMR)中分类不准带来的误差,采用基于径向基函数神经网络的非线性转换方法(RBFNN)获取转换规则。以5个普通话元音为实验,验证了分类数目和训练集对2种转换方法的影响。实验结果表明,RBFNN方法的转换效果优于LMR方法;并在只有较少训练集数据时也能得到较好的转换效果。 Voice conversion is a method which transforms the source speech to a speech signal with the acoustic characteristics of target speaker. Formant parameters which estimated by root-finding method based on LP analysis are chosen for the transformation parameters. A nonlinear transformation based on radial basis function neural network is presented to reduce transformation error caused by inaccurate classification of linear multivariate regression. Five vowel phones in Mandarin speech are selected and some experiments about the number of class and the training data are carried out. Experimental results prove that RBF neural network has a better performance than LMR and the performance of RBF neural network has litter relation with training data.
作者 王海祥
出处 《电子测量技术》 2006年第6期60-63,共4页 Electronic Measurement Technology
关键词 共振峰参数 径向基函数神经网络 分类线性转换 Itakura距离 formant parameters radial basis function neural network classified linearly transformation Itakura distance
  • 相关文献

参考文献12

  • 1MOULINES E,SAGISKA Y.Voice conversion:state of the art and perspectives[J].Speech Communication,1995,16(2):125-126. 被引量:1
  • 2左国玉,刘文举,阮晓钢.声音转换技术的研究与进展[J].电子学报,2004,32(7):1165-1172. 被引量:32
  • 3KLATT D H,KLATT L C.Analysis,synthesis and perception of voice quality variations among female and male talkers[J].J.Acoust.Soc.of Am.,1990,87(3):820-857. 被引量:1
  • 4MATSUMOTO H,HIKI H,SONE T,et al.Multidimensional representation of personal quality of vowels and its acoustical correlates[J].IEEE Trans.Audio and Elec-troacoustics,1973,21(5):428-436. 被引量:1
  • 5HSIAO Y S,CHILDERS D G.A modified root-finding formant estimation algorithm based on LP analysis[J].Proceedings of the IASTED International Conf.On Signal and Image Processing,1996,11:30-33. 被引量:1
  • 6FURUI S.Digital speech processing,synthesis,and recognition[M].New York:Marcel Dekker,Inc.,1989. 被引量:1
  • 7VALBRET H,MOULINES E,TUBACH J P.Voice conversion using PSOLA technique[J].Speech Communication,1992,11(2-3):175-187. 被引量:1
  • 8NARENDRANATH M,MURTHY H,RAJENDRAN S,et al.Transformation of formants for voice conversion using artificial neural networks[J].Speech Communication,1995,16(2):207-216. 被引量:1
  • 9HAYKIN S.Neural networks:a comprehensive foundation[M].2ND ed.New York:Macmillan,2004. 被引量:1
  • 10HATANAKA T.Multi-objective structure selection for radial basis function networks based on genetic algorithm[J].Evolutionary Computation,2003,2:1095-1100. 被引量:1

二级参考文献56

  • 1H Kuwabara and Y Sagisaka.Acoustic characteristics of speaker individuality:control and conversion[J].Speech Communication.1995,16(2):165-173. 被引量:1
  • 2D Klatt and L C Klatt.Analysis,synthesis,and perception of voice quality variations among female and male talkers[J].J Acoust Soc Am,1990,87(2):820-857. 被引量:1
  • 3P H Milenkovic.Voice source model for continuous control of pitch period[J].J Acoust Soc Am,1993,93(2):1087-1096. 被引量:1
  • 4H Matsumoto,et al.Multidimensional representation of personal quality of vowels and its acoustical correlates[J].IEEE Trans Audio and Electroacoustics,1973,21(5):428-436. 被引量:1
  • 5S Furui.Research on individuality features in speech waves and automatic speaker recognition techniques [J].Speech Communication,1986,5(2):183-197. 被引量:1
  • 6K S Lee,et al.A new voice transformation based on both linear and nonlinear prediction[A].Proc ICSLP[C].Philadelphia,USA:ESCA,1996.1401-1404. 被引量:1
  • 7L M Arslan.Speaker transformation algorithm using segmental codebooks (STASC)[J].Speech Communication,1999,28(3):211-226. 被引量:1
  • 8H Mizuno and M Abe.Voice conversion algorithm based on piecewise linear conversion rules of formant frequency and spectrum tilt[J].Speech Communication.1995,16(2):165-173. 被引量:1
  • 9T Yoshimura,et al.Speaker interpolation in HMM-based speech synthesis system[A].Proc.Eurospeech [C].Rhodes,Greece:ESCA,1997.2523-2526. 被引量:1
  • 10D G Childers.Glottal source modeling for voice conversion [J].Speech Communication.1995,16 (2):127-138. 被引量:1

共引文献31

相关作者

内容加载中请稍等...

相关机构

内容加载中请稍等...

相关主题

内容加载中请稍等...

浏览历史

内容加载中请稍等...
;
使用帮助 返回顶部