摘要
抽取短时声学特征参数如MFCC、PLP,使用高斯混合模型(GMM)估计特征参数对应音素的概率分布的隐马尔可夫模型(HMM)在大词汇连续语音识别系统(LVCSR)已取得了良好识别效果.但短时特征却不能有效反应连续帧之间的相关特性,因此运用神经网络多层感知器(MLP)产生两类差异特征用于描述该帧的音素后验概率,并将其与传统特征复合为新的特征参数流,利用新特征流对GMHMM模型进行重构.对比实验结果表明,采用该混合声学特征的LVCSR系统其错字率(CER)有了3%~7%的改善.
Typically Hidden Markov Model(HMM) in large vocabulary continuous speech recognition system(LVCSR),extracting short-term acoustic features vectors such as MFCC,PLP,estimating the distributions of the decelerated acoustic features that correspond to phoneme units by Gaussian mixture model(GMM),has achieved good recognition results.However,these short-time features are not explicitly optimized for phone discrimination.In this paper,two kind of multi-layer perceptrons(MLPs) are used to estimate posterior phone probabilities at the frame level.By combining the two neural-net discriminative features and regular features as base features processing with GMM,a large improvement is achieved.Experiments show the improved acoustic features leads to an absolute reduction of the character error rate(CER) of about 3% —7% .
出处
《云南大学学报(自然科学版)》
CAS
CSCD
北大核心
2010年第S1期368-371,共4页
Journal of Yunnan University(Natural Sciences Edition)
关键词
声学特征
差异特征
神经网络
多层感知器
acoustic features
discriminative features
Artificial Neural Networks (ANN)
multi-layer perceptron (MLP)