期刊文献+

基于最大似然多项式回归的鲁棒语音识别 被引量:3

Maximum likelihood polynomial regression for robust speech recognition
下载PDF
导出
摘要 本文针对最大似然线性回归算法线性假设的缺点,将多项式回归方法用于模型自适应,构建了基于最大似然多项式回归的非线性模型自适应算法。该算法在对数谱域用多项式回归方法,逼近每个Mel子带上识别环境模型均值与训练环境模型均值之间的非线性关系。多项式系数通过EM算法和最大似然准则从识别环境下的少量自适应数据中估计。实验结果表明,二阶多项式就可以较好地逼近模型均值的非线性环境变换关系。在噪声补偿和说话人自适应实验中,最大似然多项式回归算法的误识率都明显低于最大似然线性回归算法。本文算法较好地克服了线性模型自适应算法线性假设的缺陷,可同时减小噪声,和说话人的改变或其它因素对语音识别系统的影响,尤其适合说话人和噪声的联合自适应。 The linear hypothesis is the main disadvantage of maximum likelihood linear regression (MLLR). This paper applies the polynomial regression method to model adaptation and establishes a nonlinear adaptation algorithm using maximum likelihood polynomial regression (MLPR) for robust speech recognition. In this algorithm, the nonlinear relationship between training and testing mean vectors in every Mel-band is approximated by a set of polynomials. The polynomial coefficients are estimated from small adaptation data in test environment by the expectation-maximization (EM) algorithm and maximum likelihood (ML) criterion. The experimental results show that the second-order polynomial can approximate the nonlinear function of training and testing mean vectors perfectly. In noise compensation and speaker adaptation, the word error rates of MLPR are significantly lower than those of MLLR. The proposed algorithm overcomes the limitation of linear hypothesis well and can decrease the impact of noise, speaker and other factors simultaneously. It is especially suitable for joint adaptation of speaker and noise.
作者 吕勇 吴镇扬
出处 《声学学报》 EI CSCD 北大核心 2010年第1期88-96,共9页 Acta Acustica
基金 国家973计划(2002CB312102) 国家自然科学基金(60672094)资助项目
关键词 最大似然准则 语音识别系统 多项式回归 线性回归算法 说话人自适应 模型自适应 非线性模型 自适应算法 Blind source separation Error compensation Polynomials Regression analysis Speech recognition Statistical tests
  • 相关文献

参考文献21

  • 1刘海滨,吴镇扬,赵力,曾毓敏.基于动态单边自相关序列和频率规整线性预测的抗噪声语音识别[J].声学学报,2004,29(2):182-186. 被引量:5
  • 2王欢良,钱瑶,F.K.Soong,韩纪庆.基于声调建模的带噪汉语数字串语音识别[J].声学学报,2007,32(5):454-460. 被引量:2
  • 3Kim W, Hansen J H L. Feature compensation in the cepstral domain employing model combination. Speech Com- munication, 2009; 51(2): 83-96. 被引量:1
  • 4Cui X, Alwan A. Noise robust speech recognition using feature compensation based on polynomial regression of utterance SNR. IEEE Trans. on Speech and Audio Processing, 2005; 13(6): 1161-1172. 被引量:1
  • 5赵蕤,王作英.语音识别中信道和噪音的联合补偿[J].声学学报,2006,31(5):466-470. 被引量:11
  • 6Gauvain J L, Lee C H. Maximum a posteriori estimation for multivariate Gaussian mixture observations of Markov chains. IEEE Trans. on Speech and Audio Processing, 1994; 2(2): 291-298. 被引量:1
  • 7Leggetter C J, Woodland P C. Maximum likelihood linear regression for speaker adaptation of continuous density hidden Markov models. Computer Speech and Language, 1995; 9(2): 171-185. 被引量:1
  • 8Gales M J F, Woodland P C. Mean and variance adaptation within the MLLR framework. Computer Speech and Language, 1996; 10(4): 249-264. 被引量:1
  • 9Doh S J. Enhancements to transformation-based speaker adaptation: principal component and inter-class maximum likelihood linear regression. Carnegie Mellon University, 2000. 被引量:1
  • 10Chesta C, Siohan O, Lee C H. Maximum a posteriori linear regression for hidden Markov model adaptation. In: Proc. Eurospeech, 1999:211-214. 被引量:1

二级参考文献67

共引文献18

同被引文献36

  • 1刘海滨,吴镇扬,赵力,曾毓敏.噪声环境下基于最大后验非线性变换的隐马尔可夫模型自适应算法[J].声学学报,2004,29(5):467-471. 被引量:4
  • 2赵蕤,王作英.语音识别中信道和噪音的联合补偿[J].声学学报,2006,31(5):466-470. 被引量:11
  • 3Garreton C, Yoma N B. Telephone channel compensation in speaker verification using a polynomial approximation in the log-filter-bank energy domain. IEEE Trans. on Audio, Speech, and Language Processing, 2012; 20(1): 336-341. 被引量:1
  • 4郭武.复杂信道下的说话人识别.博士学位论文,中国科学技术大学,2008. 被引量:1
  • 5Lu Yong, Wu Haiyang, Wu Zhenyang. Robust speech recognition using improved vector Taylor series algorithm for embedded systems. IEEE Transactions on Consumer Electronics, 2010; 56(2): 764-769. 被引量:1
  • 6Burger L, Matejka P, Schwarz Pet al. Analysis of feature extraction and channel compensation in a GMM speaker recognition system. IEEE Transactions on Audio, Speech, and Language Processing, 2007; 15(7): 1979-1986. 被引量:1
  • 7Reynolds D A. Channel robust speaker verification via fea- ture mapping. In: Proc. ICASSP, 2003; 2:53-56. 被引量:1
  • 8Teunen R, Shahshahani B, Heck L. A model-based trans- formational approach to robust speaker recognition. In: Proc. ICSLP, 2000; 2:495 498. 被引量:1
  • 9Yin ShouChun, Rose R, Kenny P. A joint factor anal- ysis approach to progressive model adaptation in text- independent speaker verification. IEEE Transactions on Audio, Speech, and Language Processing, 2007; 15(7): 1999 2010. 被引量:1
  • 10Campbell W M, Sturim D E, Reynolds D A et al. SVMbased speaker verification using a GMM supervector ker- nel and NAP variability compensation. In: Proc. ICASSP, 2006; 1:97-100. 被引量:1

引证文献3

二级引证文献18

相关作者

内容加载中请稍等...

相关机构

内容加载中请稍等...

相关主题

内容加载中请稍等...

浏览历史

内容加载中请稍等...
;
使用帮助 返回顶部