期刊文献+

Mapping methods for output-based objective speech quality assessment using data mining 被引量:2

Mapping methods for output-based objective speech quality assessment using data mining
下载PDF
导出
摘要 Objective speech quality is difficult to be measured without the input reference speech.Mapping methods using data mining are investigated and designed to improve the output-based speech quality assessment algorithm.The degraded speech is firstly separated into three classes(unvoiced,voiced and silence),and then the consistency measurement between the degraded speech signal and the pre-trained reference model for each class is calculated and mapped to an objective speech quality score using data mining.Fuzzy Gaussian mixture model(GMM)is used to generate the artificial reference model trained on perceptual linear predictive(PLP)features.The mean opinion score(MOS)mapping methods including multivariate non-linear regression(MNLR),fuzzy neural network(FNN)and support vector regression(SVR)are designed and compared with the standard ITU-T P.563 method.Experimental results show that the assessment methods with data mining perform better than ITU-T P.563.Moreover,FNN and SVR are more efficient than MNLR,and FNN performs best with 14.50% increase in the correlation coefficient and 32.76% decrease in the root-mean-square MOS error. Objective speech quality is difficult to be measured without the input reference speech. Mapping methods using data mining are investigated and designed to improve the output-based speech quality assessment algorithm. The degraded speech is firstly separated into three classes (unvoiced, voiced and silence), and then the consistency measurement between the degraded speech signal and the pre-trained reference model for each class is calculated and mapped to an objective speech quality score using data mining. Fuzzy Gaussian mixture model (GMM) is used to generate the artificial reference model trained on perceptual linear predictive (PLP) features. The mean opinion score (MOS) mapping methods including multivariate non-linear regression (MNLR), fuzzy neural network (FNN) and support vector regression (SVR) are designed and compared with the standard ITU-T P.563 method. Experimental results show that the assessment methods with data mining perform better than ITU-T P.563. Moreover, FNN and SVR are more efficient than MNLR, and FNN performs best with 14.50% increase in the correlation coefficient and 32.76% decrease in the root-mean-square MOS error.
出处 《Journal of Central South University》 SCIE EI CAS 2014年第5期1919-1926,共8页 中南大学学报(英文版)
基金 Projects(61001188,1161140319)supported by the National Natural Science Foundation of China Project(2012ZX03001034)supported by the National Science and Technology Major Project Project(YETP1202)supported by Beijing Higher Education Young Elite Teacher Project,China
关键词 objective speech quality data mining multivariate non-linear regression fuzzy neural network support vector regression 语音质量 数据挖掘 映射方法 质量评估 模糊神经网络 多元非线性回归 一致性测量 ITU-T
  • 相关文献

参考文献5

二级参考文献83

  • 1李曦,曹广益,朱新坚,卫东.Identification and analysis based on genetic algorithm for proton exchange membrane fuel cell stack[J].Journal of Central South University of Technology,2006,13(4):428-431. 被引量:3
  • 2ATAL B S. Effectiveness of linear prediction characteristics of the speech wave for automatic speaker identification and verification [J]. Journal of the Acoustical Society of America, 1974, 55(6): 1304-1312. 被引量:1
  • 3DAVIS S, MERMELSTEIN P. Comparison of parametric representations for monosyllabic word recognition in continuously spoken sentences [J]. IEEE Transaction on Acoustics, Speech and Signal Processing, 1980, 28(4): 357-366. 被引量:1
  • 4KIM D S, LEE S Y, KIL R M. Auditory processing of speech signal for robust speech recognition in real-world noisy environments [J]. IEEE Transaction on Speech and Audio Processing, 1999, 7(1): 55-69. 被引量:1
  • 5JUANG B H, RABINER L R. Hidden Markov models for speech recognition [J]. Technometrics, 1991, 33(3): 251-272. 被引量:1
  • 6BROOMHEAD D S, LOWE D. Multivariable functional interpolation and adaptive networks [J]. Complex Systems, 1988, 2(3): 321-355. 被引量:1
  • 7SAYOUD H, OUAMOUR S. Speaker clustering of stereo audio documents based on sequential gathering process [J]. Join'hal of Information Hiding and Multimedia Signal Processing, 2010, 1(4): 344-360. 被引量:1
  • 8HANDEL S. Listening: An introduction to the perception of auditory events [M]. Massachusetts: MIT Press, t993: 461-546. 被引量:1
  • 9STROPE B, ALWAN A. A model of dynamic auditory perception and its application to robust word recognition [J]. IEEE Transaction on Speech and Audio Processing, 1997, 5 (5): 451-464. 被引量:1
  • 10HOLMBERG M, GELBART D, HEMMERT W. Automatic speech recognition with an adaptation model motivated by auditory processing [J]. IEEE Transaction on Audio, Speech, Language Processing, 2006, 14(1): 44-49. 被引量:1

共引文献14

同被引文献9

引证文献2

二级引证文献16

相关作者

内容加载中请稍等...

相关机构

内容加载中请稍等...

相关主题

内容加载中请稍等...

浏览历史

内容加载中请稍等...
;
使用帮助 返回顶部