摘要
目前多数说话人识别算法均在干净环境下进行,在噪声环境下的效果较差.为提升噪声环境下说话人识别的正确率,提出一种新的特征提取方法与识别模型WPGT.利用小波包分解高频和低频信号,Gammatone滤波器组模拟人耳听觉系统处理非线性信号,从而提取更完备的说话人语音特征,采用卷积神经网络对特征进行训练并完成说话人识别.基于开源语音数据集、噪声融合数据集,将本研究方案与常用的声纹特征提取方法MFCC和Gammatone进行对比.实验结果表明,在噪声环境下,本研究所提WPGT方法的声纹识别精度相较于MFCC和Gammatone分别提升10.63%和16.91%,具有更好的抗噪声能力.
At present,most speaker recognition algorithms are performed in a clean environment,and the effect is poor in a noisy environment.In order to improve the accuracy of speaker recognition in a noisy environment,a new feature extraction method,wavelet packet&Gammatone(WPGT)based model,is proposed.In this model,the wavelet packet is used to decompose high-frequency and low-frequency signals and the Gammatone filter bank simulates the human auditory system to process non-linear signals so that more complete speaker voice features are extracted,and finally,the convolutional neural network is used to train the features and complete speaker recognition.Based on the open source speech data sets and the noise fusion data sets,the proposed method is compared with the commonly used voiceprint feature extraction methods MFCC and Gammatone.The experimental results show that,in a noisy environment,WPGT has better anti-noise ability than MFCC and Gammatone.Compared with MFCC and Gammatone,the accuracy of WPGT is improved by 10.63%and 16.91%,respectively.
作者
徐晓梦
谭振华
李欣书
XU Xiaomeng;TAN Zhenhua;LI Xinshu(Software College,Northeastern University,Shenyang 110819,Liaoning Province,P.R.China)
出处
《深圳大学学报(理工版)》
EI
CAS
CSCD
北大核心
2020年第S01期84-91,共8页
Journal of Shenzhen University(Science and Engineering)
基金
国家重点研发计划资助项目(2019YFB1405803)
下一代互联网技术创新计划资助项目(NGII20190609)。
关键词
生物信息识别
说话人识别
小波包
卷积神经网络
biometric identification
speaker recognition
wavelet packet
convolutional neural network