摘要
伴随着人工智能的兴起,各种深度学习模型应运而生,生成式对抗网络(generative adversarial networks,GAN)作为其中的一种深度学习模型成为了研究热点。GAN已成功应用在图像处理中,但将其应用在语音增强方面是需要研究的问题。GAN应用在语音增强的研究方法与GAN的实质是一样的,是通过构造两个模型,即生成模型(generative model)和判别模型(discriminative model),也叫做生成器(generator)和判别器(discriminator)。两者通过互相竞争、对抗的形式来学习训练,GAN最终要实现的目标是生成新的数据,即实现去噪。对GAN在语音增强方面的应用进行了研究,提出了使用传统的GAN数学模型用于语音增强进行建模,同时改进了GAN的数学模型并加入了稀疏因式,将GAN增强后的语音与其他传统的语音增强方法进行对比。实验结果表明,使用GAN增强后的语音的segSNR和PESQ的得分要比传统的语音增强方法的得分高,从而证明GAN比其他传统的语音增强方法更具优越性。
Along with the rise of artificial intelligence,all kinds of deep learning models emerge.Generative adversarial networks(GAN)as a deep learning model has become a research hotspot.GAN has been successfully applied in image processing,but its application in speech enhancement is a problem that needs to be studied.GAN’s research method in speech enhancement is the same as the essence of GAN,which is based on the construction of two models,namely,generative model and discriminative model,also known as generator and discriminator.They learn and train by mutual competition and confrontation.The ultimate goal of GAN is to generate new data,that is realization of noise removal.The application of GAN in speech enhancement is studied,and the traditional GAN mathematical modeling is proposed for speech enhancement.At the same time,the mathematical model of GAN is improved and sparse factors are added.GAN enhanced speech is compared with other traditional speech enhancement methods.Experiment shows that segSNR and PESQ score of GAN enhanced voice are higher than that of traditional speech enhancement methods,which proves that GAN is more advantageous than other traditional speech enhancement methods.
作者
孙成立
王海武
SUN Cheng-li;WANG Hai-wu(School of Information Engineering,Nanchang Hangkong University,Nanchang 330063,China)
出处
《计算机技术与发展》
2019年第2期152-156,161,共6页
Computer Technology and Development
基金
国家自然科学基金(61362031
61401259
61761031
61263032)