摘要
该文研究了两种用于改善深度神经网络声学建模框架下自由表述口语语音评测任务后验概率估计的方法:1)使用RNN语言模型对一遍解码N-best候选做语言模型得分重估计来获得更准确的识别结果以重新估计后验概率;2)借鉴多语种神经网络训练框架,提出将方言数据聚类状态加入解码神经网络输出节点,在后验概率估计中引入方言似然度得分以评估方言程度的新方法。实验表明,这两种方法估计出的后验概率与人工分相关度分别绝对提升了3.5%和1.0%,两种方法融合后相关度绝对提升4.9%;对于一个真实的评测任务,结合该文改进的后验概率评分特征,总体评分相关度绝对提升2.2%。
Two methods under the deep neural network acoustic modeling framework are proposed to improve the es- timation of posterior probability for evaluation of pronunciation of freely-spoken speech: 1) the posterior probability is re-estimated with more accurate recognition results by employing RNN language model to re-score the N-best candidates produced from the first decoding process; 2) the influence of dialect to posterior probability is taken into account by involving likelihood scores produced by dialect clustered nodes added to deep neural network acoustic model which is re-trained as a multi-lingual style. Experimental results show that these methods increase the correlation (between posterior probabilities and human scores) for 3.5 % and 1.0 % respectively, and the combination of these two methods achieves 4.9% increase. In a real evaluation task, a 2.2% absolute improvement is observed in eorre lation between machine scores and human scores.
出处
《中文信息学报》
CSCD
北大核心
2017年第2期212-219,共8页
Journal of Chinese Information Processing
基金
国家自然科学基金(61273264)
关键词
自由表述口语
语音评测
后验概率
深度神经网络
RNN语言模型
freely spoken speech
pronunciation quality evaluation
posterior probability
deep neural network
RNN language model