摘要
带噪语音可看成由独立的噪声信号和语音信号经某种方式混合而成,传统语音增强方法需要对噪声信号和干净语音信号的独立性和特征分布做出假设,不合理的假设会造成噪声残留、语音失真等问题,导致语音增强效果不佳。此外,噪声本身的随机性和突变性也会影响传统语音增强方法的鲁棒性。针对这些问题,使用生成对抗网络来对语音进行增强,给出一种基于Wasserstein距离的生成对抗网络(Wasserstein generative adversarial nets,WGAN)的语音增强方法来加快训练速度和稳定训练过程。该方法无需人工提取声学特征,且使语音增强系统的泛化能力得以提升,在匹配噪声集和不匹配噪声集中都有良好的增强效果。实验结果表明,使用训练出的端对端语音增强模型后,语音信号的客观评价标准(perceptual evaluation of speech quality,PESQ)平均得到23.97%的提高。
Noisy speech can be seen as a combination of an independent noise signal and a speech signal in some way. Traditional speech enhancement techniques need to make assumptions of the independence and feature distribution of noisy and clean speech signals. Unreasonable assumptions may cause problems such as residue noise and speech distortion, resulting in poor speech enhancement. In addition, the randomness and mutability of noise itself also affect the robustness of traditional speech enhancement methods. To solve these problems, this paper uses the generative adversarial network to enhance the speech, and gives a speech enhancement method based on the WGAN to accelerate the training speed and stabilize the training process. The method does not need to manually extract acoustic features, and it improves generalization capability of the speech enhancement system. There is a good enhancement effect in both the matched noise set and the unmatched noise set. The experimental results show that the PESQ is increased by an average of 23.97% based on this end to end speech enhancement training model.
作者
王怡斐
韩俊刚
樊良辉
WANG Yifei;HAN Jungang;FAN Lianghui(Xi’an University of Posts & Telecommunications, Xi’an 710121, P. R. China)
出处
《重庆邮电大学学报(自然科学版)》
CSCD
北大核心
2019年第1期136-142,共7页
Journal of Chongqing University of Posts and Telecommunications(Natural Science Edition)
基金
国家自然科学基金重点资助项目(61136002)~~
关键词
语音增强
生成对抗网络
卷积神经网络
深度学习
speech enhancement
generative adversarial nets
convolution neural network
deep learning