基于RBF神经网络的源——目标话音转换

Voice conversion from source speaker to target speaker based on RBF neural network

下载PDF

导出

摘要源-目标说话人声音转换是一种变换说话人声音特征的技术,它将源说话人的声音转换成目标说话人的声音。本文选择声道共振峰参数作为待转换的特征参数,为了克服线性多变量回归转换方法(LMR)中分类不准带来的误差,采用基于径向基函数神经网络的非线性转换方法(RBFNN)获取转换规则。以5个普通话元音为实验,验证了分类数目和训练集对2种转换方法的影响。实验结果表明,RBFNN方法的转换效果优于LMR方法;并在只有较少训练集数据时也能得到较好的转换效果。 Voice conversion is a method which transforms the source speech to a speech signal with the acoustic characteristics of target speaker. Formant parameters which estimated by root-finding method based on LP analysis are chosen for the transformation parameters. A nonlinear transformation based on radial basis function neural network is presented to reduce transformation error caused by inaccurate classification of linear multivariate regression. Five vowel phones in Mandarin speech are selected and some experiments about the number of class and the training data are carried out. Experimental results prove that RBF neural network has a better performance than LMR and the performance of RBF neural network has litter relation with training data.

作者王海祥

机构地区中国科学技术大学电子科学与技术系

出处《电子测量技术》 2006年第6期60-63,共4页 Electronic Measurement Technology

关键词共振峰参数径向基函数神经网络分类线性转换 Itakura距离 formant parameters radial basis function neural network classified linearly transformation Itakura distance

分类号 TN912.3 [电子电信—通信与信息系统]

引文网络
相关文献

参考文献12

1MOULINES E,SAGISKA Y.Voice conversion:state of the art and perspectives[J].Speech Communication,1995,16(2):125-126. 被引量：1
2左国玉,刘文举,阮晓钢.声音转换技术的研究与进展[J].电子学报,2004,32(7):1165-1172. 被引量：32
3KLATT D H,KLATT L C.Analysis,synthesis and perception of voice quality variations among female and male talkers[J].J.Acoust.Soc.of Am.,1990,87(3):820-857. 被引量：1
4MATSUMOTO H,HIKI H,SONE T,et al.Multidimensional representation of personal quality of vowels and its acoustical correlates[J].IEEE Trans.Audio and Elec-troacoustics,1973,21(5):428-436. 被引量：1
5HSIAO Y S,CHILDERS D G.A modified root-finding formant estimation algorithm based on LP analysis[J].Proceedings of the IASTED International Conf.On Signal and Image Processing,1996,11:30-33. 被引量：1
6FURUI S.Digital speech processing,synthesis,and recognition[M].New York:Marcel Dekker,Inc.,1989. 被引量：1
7VALBRET H,MOULINES E,TUBACH J P.Voice conversion using PSOLA technique[J].Speech Communication,1992,11(2-3):175-187. 被引量：1
8NARENDRANATH M,MURTHY H,RAJENDRAN S,et al.Transformation of formants for voice conversion using artificial neural networks[J].Speech Communication,1995,16(2):207-216. 被引量：1
9HAYKIN S.Neural networks:a comprehensive foundation[M].2ND ed.New York:Macmillan,2004. 被引量：1
10HATANAKA T.Multi-objective structure selection for radial basis function networks based on genetic algorithm[J].Evolutionary Computation,2003,2:1095-1100. 被引量：1

二级参考文献56

1H Kuwabara and Y Sagisaka.Acoustic characteristics of speaker individuality:control and conversion[J].Speech Communication.1995,16(2):165-173. 被引量：1
2D Klatt and L C Klatt.Analysis,synthesis,and perception of voice quality variations among female and male talkers[J].J Acoust Soc Am,1990,87(2):820-857. 被引量：1
3P H Milenkovic.Voice source model for continuous control of pitch period[J].J Acoust Soc Am,1993,93(2):1087-1096. 被引量：1
4H Matsumoto,et al.Multidimensional representation of personal quality of vowels and its acoustical correlates[J].IEEE Trans Audio and Electroacoustics,1973,21(5):428-436. 被引量：1
5S Furui.Research on individuality features in speech waves and automatic speaker recognition techniques [J].Speech Communication,1986,5(2):183-197. 被引量：1
6K S Lee,et al.A new voice transformation based on both linear and nonlinear prediction[A].Proc ICSLP[C].Philadelphia,USA:ESCA,1996.1401-1404. 被引量：1
7L M Arslan.Speaker transformation algorithm using segmental codebooks (STASC)[J].Speech Communication,1999,28(3):211-226. 被引量：1
8H Mizuno and M Abe.Voice conversion algorithm based on piecewise linear conversion rules of formant frequency and spectrum tilt[J].Speech Communication.1995,16(2):165-173. 被引量：1
9T Yoshimura,et al.Speaker interpolation in HMM-based speech synthesis system[A].Proc.Eurospeech [C].Rhodes,Greece:ESCA,1997.2523-2526. 被引量：1
10D G Childers.Glottal source modeling for voice conversion [J].Speech Communication.1995,16 (2):127-138. 被引量：1

共引文献31

1吴梅,冯瑞杰.试论一种语音转换系统的设计与实现[J].中亚信息,2010(S1):61-63.
2左国玉,刘文举,阮晓钢.语音转换技术在电话语音识别中的应用研究(英文)[J].系统仿真学报,2005,17(2):448-452.
3左国玉,刘文举,阮晓钢.一种使用声调映射码本的汉语声音转换方法[J].数据采集与处理,2005,20(2):144-149. 被引量：4
4符敏,程德福.支持向量回归在声音转换中的应用[J].电声技术,2006,30(3):45-48. 被引量：1
5张晓洲,黄德智,蔡莲红.考虑帧间动态特征的音色变换算法[J].清华大学学报（自然科学版）,2006,46(10):1767-1770. 被引量：1
6康永国,双志伟,陶建华,张维.基于混合映射模型的语音转换算法研究[J].声学学报,2006,31(6):555-562. 被引量：13
7王海祥,戴蓓蒨,陆伟,张剑.基于共振峰参数和分类线性加权的源-目标声音转换[J].中国科学技术大学学报,2006,36(11):1153-1159.
8孙俊,戴蓓蒨,张剑.基于基元段特征和GMM的源-目标说话人F_0～t转换[J].信号处理,2007,23(2):283-287.
9王卉,王小军,马骏.基于CMOS工艺的音频前置放大器的设计与实现[J].电子器件,2007,30(3):870-873.
10张照坤.语音转换关键技术研究[J].电脑知识与技术,2008(3):1309-1311.

1王海祥,戴蓓蒨,陆伟,张剑.基于共振峰参数和分类线性加权的源-目标声音转换[J].中国科学技术大学学报,2006,36(11):1153-1159.
2邹翔,岳振军,贾永兴,闵刚.基于一乘准则的LMR在声音转换中的应用[J].军事通信技术,2008,29(1):28-31.
3王振力,张雄伟.基于分数阶谱相减的语音增强法[J].电子与信息学报,2007,29(5):1096-1100. 被引量：6
4张剑,戴蓓蒨,孙俊,陆伟,李辉.基于分类线性加权的源-目标话者声音转换算法的研究[J].电路与系统学报,2008,13(3):106-110. 被引量：1
5为陆地移动无线电系统(LMR)提供低相噪经济型解决方案 MS2830A-066低相位噪声选件发布[J].电子测量与仪器学报,2012,26(9):817-817.
6章文义,朱杰,陈斐利.一种新的共振峰参数提取算法及在语音识别中的应用[J].计算机工程,2003,29(13):67-68. 被引量：3
7黄泽镇,杨行峻.用HLPC算法估计共振峰参数的精度研究[J].电子学报,1990,18(5):27-33.
8郁伯康,郁梅.LPC方法提取语音信号共振峰的分析[J].电声技术,2000,24(3):3-8. 被引量：13
9为陆地移动无线电系统(LMR)提供低相噪经济型解决方案[J].国外电子测量技术,2012,31(9):73-73.
10李密,赵海鸣,熊志宏,王猛.模拟深海采矿环境下混响的局部平稳性分析[J].电子学报,2015,43(7):1281-1285. 被引量：1

电子测量技术

2006年第6期

浏览历史

内容加载中请稍等...

基于RBF神经网络的源——目标话音转换

参考文献12

二级参考文献56

共引文献31

相关作者

相关机构

相关主题

浏览历史