摘要
为提高语音识别系统在复杂声学场景下的识别率,出现了以单通道语音增强(Monaural Speech Enhancement)技术作为前端处理的鲁棒语音识别系统。尽管现有的单通道语音增强技术能够提高混响干扰下的识别率,却未能显著提升宽带非平稳噪声干扰下的系统识别率。为此,本文提出基于听觉掩蔽生成对抗网络的单通道增强方法,通过听觉掩蔽增强模型和判别器构成的对抗过程,来使增强后的语音特征满足目标语音的概率分布。实验结果表明,就语音识别率而言,所提出的听觉掩蔽生成对抗网络超越了现有的增强方法,能够相对减少19.50%的词错误率,显著提升语音识别系统的噪声鲁棒性。
To improve the accuracy of speech recognition system in the complex acoustic scene,monaural speech enhancement method is involved into the robust automatic speech recognition(ASR)system as a front-end processing.Although monaural speech enhancement has improved the recognition performance under the reverberant conditions,it failed to improve the accuracy of speeches interrupted by the wide-band non-stationary noises.To overcome this problem,the paper proposes the adversarial generative network based on auditory masking for monaural speech enhancement.Through the adversarial process between a discriminator and a masking-based enhancement model,the proposed method can make the enhanced speech features follow the distribution of target speeches.Experimental results show that,the proposed method outperforms current enhancement method in terms of recognition accuracy.It achieves 19.50%relative word error rate(GER)reduction for a robust ASR system,which indicates that the proposed method can further improve the noise robustness.
作者
杜志浩
韩纪庆
DU Zhihao;HAN Jiqing(School of Computer Science and Technology,Harbin Institute of Technology,Harbin 150001,China)
出处
《智能计算机与应用》
2021年第3期209-214,共6页
Intelligent Computer and Applications
基金
国家重点研发项目(2017YFB1002102)
关键词
听觉掩蔽
生成对抗网络
单通道语音增强
鲁棒语音识别
auditory masking
adversarial generative network
monaural speech enhancement
robust speech recognition