摘要
提出了一种新的基于子空间的快速说话人自适应方法.该方法在本征音(Eigen-voice,EV)自适应方法基础上,进一步在音子空间寻找低维子空间,得到更为紧凑的"说话人–音子"联合子空间.该子空间不仅包含了说话人间的模型参数相关性信息,而且对音子间的模型参数相关性信息也进行了显式建模,在大大降低模型存储量的同时更为全面地反映模型参数的先验信息.在基于连续语音识别的无监督自适应实验中,在少量的自适应数据条件下,新方法取得了比最大似然线性回归和聚类最大似然线性基方法更好的效果.
A new speaker adaptation method based on subspace modeling is proposed. After performing eigen-voice (EV) analysis and finding the speaker subspace,another low dimensional subspace is found in the phone space. The new subspace can capture the inter-speaker variability as well as intra-speaker variability of the hidden Markov model (HMM) model parameters. This joint speaker-phone subspace is both robust and compact. In large vocabulary continuous speech recognition experiments,the new method showed better unsupervised adaptation than the baseline maximum likelihood linear regression and clustered maximum-likelihood linear basis adaptation method,especially when the adaptation data were less than 30s.
出处
《自动化学报》
EI
CSCD
北大核心
2011年第12期1495-1502,共8页
Acta Automatica Sinica
基金
国家自然科学基金(60872142
61005019
61175017)资助~~
关键词
连续语音识别
说话人自适应
本征音
本征音子
Continuous speech recognition
speaker adaptation
eigen-voice (EV)
eigen-phone (EP)