
基于最小生成误差的HMM模型聚类自动优化 被引量:1

Minimum Generation Error Based Optimization of HMM Model Clustering for Speech Synthesis
摘要 为改善决策树聚类的效果,避免可能出现的聚类模型过训练或欠训练的情况,提出一种基于最小生成误差以及通过交叉验证优化最小描述距离(MDL)因子选取的方法.文中通过计算交叉验证中的生成误差选择MDL因子,从而优化决策树大小.实验结果表明,此方法相对传统的固定MDL门限设定方法,更有效提升合成语音的音质和自然度. To improve the decision tree clustering and avoid possible clustered model over-training and less-training,a minimal generation error criterion and cross-validation(CV) based minimal description length factor optimizing method is introduced.CV based generation error is calculated to optimize the scale of the decision tree.Results of both subjective and objective tests show that synthesized speech by the proposed method outperforms the synthesized speech by the baseline one system in both quality and naturalness.
出处 《模式识别与人工智能》 EI CSCD 北大核心 2010年第6期822-828,共7页 Pattern Recognition and Artificial Intelligence
关键词 隐马尔可夫模型(HMM) 语音合成 决策树聚类 最小描述距离(MDL) 交叉验证(CV) Hidden Markov Model(HMM) Speech Synthesis Decision Tree Clustering Minimal Description Length(MDL) Cross-Validation(CV)
  • 相关文献


  • 1Hunt A J,Black A W.Unit Selection in a Concatenative Speech Synthesis System Using a Large Speech Database // Proc of the IEEE International Conference on Acoustics,Speech and Signal Process.Atlanta,USA,1996:373-376. 被引量:1
  • 2Tokuda K,Yoshimura T,Masuko T,et al.Speech Parameter Generation Algorithms for HMM-Based Speech Synthesis // Proc of the IEEE International Conference on Acoustics,Speech and Signal Process.Istanbul,Turkey,2000,Ⅲ:1315-1318. 被引量:1
  • 3Yoshimura T,Tokuda K,Masuko T,et al.Simultaneous Modeling of Spectrum,Pitch and Duration in HMM-Based Speech Synthesis // Proc of the 6th European Conference on Speech Communication and Technology.Budapest,Hungary,1999,Ⅴ:2347-2350. 被引量:1
  • 4Tokuda K,Masuko T,Miyazaki N,et al.Hidden Markov Models Based on Multi-Space Probability Distribution for Pitch Pattern Modeling // Proc of the IEEE International Conference on Acoustics,Speech and Signal Process.Phoenix,USA,1999:229-232. 被引量:1
  • 5Shinoda K,Watanabe T.MDL-Based Context-Dependent Subword Modeling for Speech Recognition.Acoustical Science and Technology,2000,21(2):79-86. 被引量:1
  • 6吴义坚,王仁华.基于HMM的可训练中文语音合成[J].中文信息学报,2006,20(4):75-81. 被引量:17
  • 7Wu Yijian,Wang Renhua.Minimum Generation Error Training for HMM-Based Speech Synthesis // Proc of the IEEE International Conference on Acoustics,Speech and Signal Process.Toulouse,France,2006:89-92. 被引量:1
  • 8Kawahara H,Masuda-Katsuse I,de Chveigné A.Restructuring Speech Representations Using a Pitch-Adaptive Time Frequency Smoothing and a Instantaneous-Frequency-Based F0 Extraction:Possible Role of a Repetitive Structure in Sounds.Speech Communication,1999,27(3/4):187-207. 被引量:1
  • 9Tokuda K,Zen H,Yamagishi J,et al.The HMM-Based Speech Synthesis System (HTS)[EB/OL].[2009-06-01] http://hts.sp.nitech.ac.jp/. 被引量:1
  • 10Laroia R,Phamdo N,Farvardin N.Robust and Efficient Quantization of Speech LSP Parameters Using Structured Vector Quantizers // Proc of the International Conference on Acoustics,Speech and Signal Processing.Toronto,Canada,1991:641-644. 被引量:1


  • 1R.H.Wang,Qingfeng Liu,Deyu Xia,Towards A Chinese Text-To-Speech System With Higher Naturalness[A],In:Proc.of ICSLP[C].Sydney,1998,p2047-2050. 被引量:1
  • 2R.H.Wang,Zhongke Ma,Wei Li,Donglai Zhu,A Corpus-Based Chinese Speech Synthesis with ContextualDependent Unit Selection[A].In:Proc.of ICSLP[C].Beijing,2000,p391 -394. 被引量:1
  • 3L.R.Rabiner,A tutorial on hidden Markov models and selected applications in speech recognition.Proc.of IEEE,1989[J].vol.77,pp.257-286. 被引量:1
  • 4R.E.Donovan and E.M.Eide,The IBM trainable speech synthesis system[A].In:Proc.of ICSLP[C].Sydney,1998,vol.5,pp.1703-1706. 被引量:1
  • 5X.Huang,A.Acero,H.Hon,Y.Ju,J.Liu,S.Merdith,and M.Plumpe,Recent improvements on Microsoft's trainable text-to-speech system-Whistler[A].In:Proc.of ICASSP[C].Munich,1997,pp.959-962. 被引量:1
  • 6T.masuko,K.Tokuda,T.Kobayashi,and S.Imai,Speech synthesis from HMMs using dynamic features[A].In:Proc.of ICASSP[C].Atlanta,1996,pp.389 -392. 被引量:1
  • 7T.Yoshimura,K.Tokuda,T.Masuko,T.Kobayashi,and T.Kitamura,Simultaneous modeling of spectrum,pitch and duration in HMM-based speech synthesis[A].In:Proc.of Eurospeech[C].Budapest,1999,vol.5,pp.2347-2350. 被引量:1
  • 8K.Tokuda,T.Masuko,N.Miyazaki,and T.Kobayashi,Hidden Markov models based on multi-space probability distribution for pitch pattern modeling.In:Proc.of ICASSP[C].Arizona,1999,pp.229-232. 被引量:1
  • 9T.Yoshimura,K.Tokuda,T.Masuko,T.Kobayashi and T.Kitamura,Duration modeling in HMM-based speech synthesis system[A].In:Proc.of ICSLP[C].Sydney,1998,vol.2,pp.29-32. 被引量:1
  • 10H.Kawahara,I.Masuda-Katsuse and A.deCheveigne,Restructuring speech representations using pitch-adaptive time-frequency smoothing and an instantaneous-frequency-based FO extraction:possible role of a repetitive structure in sounds,Speech Communication[J].1999,vol.27,pp.187-207. 被引量:1



  • 1李伟红,刘丽娟,龚卫国,辜小花.人脸识别中基于均匀设计的SVM超参数调节方法[J].光电子.激光,2009,20(10):1342-1347. 被引量:3
  • 2李春香,张为民,钟碧良.最小二乘支持向量机的参数优化算法研究[J].杭州电子科技大学学报(自然科学版),2010,30(4):213-216. 被引量:9
  • 3W.H. Tang, Q.H. Wu. Condition Monitoring and Assessment of Power Transformers Using Computational Intelligence [M]. New Y(rk: Springer-Verlag Press, 2011, 95-104. 被引量:1
  • 4Vapnik V N. The nature of statistical learning theory [M]. New York: Springer-Verlag, 1995, 181-218. 被引量:1
  • 5Nello Cristianini, John Shawe-Taylor. An Introduction to Support Vector Machines and Other Kernel-based Learning Methods [J]. New York: Cambridge University Press, 2000, 93-124. 被引量:1
  • 6Stone M. Cross-validalory choice and assessment of statistical predictions [J]. Journal of the Royal Statistical Society, 1974, 56(2): 111-147. 被引量:1
  • 7F. Leisch, L.C. Jain, K. Hornik. Cross-validation with active pattern seleetion for neural network classifiers [J]. IEEE Transaction on Neural Network, 1998, 9(1), 35-41. 被引量:1
  • 8Michael Affenzeller, Stephan Winkler, Stefm Wagner, Andreas Beham. Genetic Algorithms and Genetic Programming: Modern Concepts and Practical Applications[M]. New York: CRC, 2009, 1-22. 被引量:1
  • 9Duan K, Keerthi S, Poo A. Evaluation of simple performance measures for tuning SVM hyperparameters[J]. N eurocomputing, 2003, 51: 41-59. 被引量:1
  • 10Chalimourda A, Seholkopf B, Smola A. Experimentally optimal v in support vector regression for different noise models and parameter settings[J]. Neural Networks, 2004, 17: 127-141. 被引量:1










使用帮助 返回顶部