期刊文献+

一种新的基于子空间的说话人自适应方法 被引量:3

A New Subspace Based Speaker Adaptation Method
下载PDF
导出
摘要 提出了一种新的基于子空间的快速说话人自适应方法.该方法在本征音(Eigen-voice,EV)自适应方法基础上,进一步在音子空间寻找低维子空间,得到更为紧凑的"说话人–音子"联合子空间.该子空间不仅包含了说话人间的模型参数相关性信息,而且对音子间的模型参数相关性信息也进行了显式建模,在大大降低模型存储量的同时更为全面地反映模型参数的先验信息.在基于连续语音识别的无监督自适应实验中,在少量的自适应数据条件下,新方法取得了比最大似然线性回归和聚类最大似然线性基方法更好的效果. A new speaker adaptation method based on subspace modeling is proposed. After performing eigen-voice (EV) analysis and finding the speaker subspace,another low dimensional subspace is found in the phone space. The new subspace can capture the inter-speaker variability as well as intra-speaker variability of the hidden Markov model (HMM) model parameters. This joint speaker-phone subspace is both robust and compact. In large vocabulary continuous speech recognition experiments,the new method showed better unsupervised adaptation than the baseline maximum likelihood linear regression and clustered maximum-likelihood linear basis adaptation method,especially when the adaptation data were less than 30s.
出处 《自动化学报》 EI CSCD 北大核心 2011年第12期1495-1502,共8页 Acta Automatica Sinica
基金 国家自然科学基金(60872142 61005019 61175017)资助~~
关键词 连续语音识别 说话人自适应 本征音 本征音子 Continuous speech recognition speaker adaptation eigen-voice (EV) eigen-phone (EP)
  • 相关文献

参考文献12

  • 1李虎生,刘加,刘润生.语音识别说话人自适应研究现状及发展趋势[J].电子学报,2003,31(1):103-108. 被引量:32
  • 2Woodland P C. Speaker adaptation: techniques and challenges. In: Proceedings of the IEEE Workshop on Automatic Speech Recognition and Understanding, USA: IEEE, 1999. 85-90. 被引量:1
  • 3Mak B K W, Lai T C, Tsang I W, Kwok J T Y. Maximum penalized likelihood kernel regression for fast adaptation. IEEE Transactions on Audio, Speech and Language Processing, 2009, 17(7): 1372-1381. 被引量:1
  • 4Kuhn R, Junqua J C, Nguyen P, Niedzielski N. Rapid speaker adaptation in eigenvoice space. IEEE Transactions on Speech and Audio Processing, 2000, 8(6): 695-707. 被引量:1
  • 5Teng W X, Gravier C, Bimbot F, Souffiet F. Speaker adap- tation by variable reference model subspace and application to large vocabulary speech recognition. In: Proceedings of the IEEE International Conference on Acoustics, Speech and Signal Processing. Taipei, China: IEEE, 2009. 4381-4384. 被引量:1
  • 6Kenny P, Boulianne G, Dumouchel P. Eigenvoice modeling with sparse training data. IEEE Transactions on Speech and Audio Processing, 2005, 13(3): 345-354. 被引量:1
  • 7Mak B, Lai T C, Hsiao R. Improving reference speaker weighting adaptation by the use of maximum-likelihood ref- erence speakers. In: Proceedings of the IEEE International Conference on Acoustics, Speech and Signal Processing. Toulouse, France: IEEE, 2006. 229-232. 被引量:1
  • 8Tang Y, Rose R. Rapid speaker adaptation using clustered maximum-likelihood linear basis with sparse training data. IEEE Transactions on Audio, Speech, and Language Processing, 2008, 16(3): 607-616. 被引量:1
  • 9Jeong Y. Speaker adaptation based on the multilinear de- composition of training speaker models. In: Proceedings of the IEEE International Conference on Acoustics, Speech and Signal Processing. Dallas, USA: IEEE, 2010. 4870-4873. 被引量:1
  • 10Kenny P, Boulianne G, Ouellet P, Dumouchel P. Speaker adaptation using an eigenphone basis. IEEE Transactions on Speech and Audio Processing, 2004, 12(6): 579-589. 被引量:1

二级参考文献2

共引文献31

同被引文献28

  • 1A. Acero, X. Huang. Augmented Cepstral Normalizationfor Robust Speech Recognition[C]. Proc. of IEEE Automatic Speech Recognition Workshop. Snow- bird, Utah, USA: 1995. 被引量:1
  • 2P. Jain, H. Hermansky. Improved Mean and Variance Normalization for Robust Speech Recognition[C]. Proceedings of 2001 IEEE In- ternational Conference on Acoustics, Acoustics and Signal Processing. Salt Lake City, Utah, USA: 2001. 被引量:1
  • 3Reynolds D,Quatieri T, Dunn R.Speaker Verification Usinadapted Gaussian Mixture[J]. Digital Signal Processing,2000,10:19-41. 被引量:1
  • 4N.Dehak,P.J.Kenny,R.Dehak,P.Dumouchel,andP. Ouellet. Front-end Factor Analysis for Speaker Verification[J]. IEEE Transaction on Audio,Speech,and language Processing, 2011,19 (4) :788-798. 被引量:1
  • 5Y.Zhang,J.Xu,Z.-J.Yan,Q.Huo. A i-vector Based Approach to Training Data Clustering for Improved Speech Recongnition. Proc. Interspeech-2100. 被引量:1
  • 6J.Xu,Y.Zhang,Z.-J.Yan,and Q. Huo,An i-vector Based Approach to Acoustic Sniffing for Irrelevant Variability Normalization Based Acoustic Model Training and Speech Recognition". Proc. Interspeech-2011:1701-1704. 被引量:1
  • 7J.Xu,Y.Zhang,Z.-J.Yan, Q.Huo. A New i-vecter Approach and Its Application to Irrelevent Variability Normalization Basced Acoustic Model Training, MLSP-2011,Beijing,china,6pages. 被引量:1
  • 8孙圣和,陆哲明.矢量量化技术及应用[M].北京:北京科学出版社,2002. 被引量:1
  • 9Lee C-H , Lin C-H , Juang B-H. A Study on Speaker Adaptation of the Parameters of Continuous Density Hidden Markov Models[J]. IEEE Transactions on Signal Processing, 1991, 39(4): 806-814. 被引量:1
  • 10M. J. F. Gales, P. C. Woodland. Mean and Variance Adaptation within the MLLR Framework. Computer Speech and Language, 1996,10(4) :249-264. 被引量:1

引证文献3

二级引证文献20

相关作者

内容加载中请稍等...

相关机构

内容加载中请稍等...

相关主题

内容加载中请稍等...

浏览历史

内容加载中请稍等...
;
使用帮助 返回顶部