摘要
基于神经网络的语音增强算法模型直接在时域或时频域操作导致算法具有很高的复杂度,难以在低算力平台下实现应用。针对这一问题,提出一种基于2阶段循环神经网络的语音增强算法,在保证算法性能的前提下,大幅减少了算法复杂度。算法由2阶段子网络构成,第一阶段对语音的梅尔子带特征利用循环神经网络进行建模预测幅度谱掩码以实现语音幅度的增强。第2阶段通过循环神经网络估计噪声幅值联合相位谱补偿算法实现语音的相位的补偿。通过2阶段网络并行优化,获得了较好的增强性能。实验结果表明:相比基线模型,本文提出的算法在更低的复杂度情况下,在语音的客观指标上依旧具有优良的表现。
The speech enhancement algorithm model based on neural network directly operates in the time domain or time frequency domain,resulting in high complexity of the algorithm,which is difficult to implement and apply under low computing power platform.To solve this problem,this paper proposes a speech enhancement algorithm based on two-stage recurrent neural network,which greatly reduces the complexity of the algorithm under the premise of ensuring the performance of the algorithm.The algorithm consists of a two-stage subnetwork,in which the Mel subband features of speech are modeled by recurrent neural network to predict the amplitude spectral mask to enhance the amplitude of speech.In the second stage,the phase of the speech is compensated by a recurrent neural network estimating the noise amplitude jointly with a phase spectrum compensation algorithm.A better enhancement performance is obtained by parallel optimization of the two-stage network.The experimental results show that the algorithm proposed in this paper still has excellent performance in objective metrics of speech with lower complexity compared to the baseline model.
作者
章琳志
刘梦强
张夜
张燕凯
ZHANG Linzhi;LIU Mengqiang;ZHANG Ye;ZHANG Yankai(Key Laboratory of Functional Materials and Devices for Informatics of Anhui Higher Education Institutes,Fuyang Normal University,Fuyang,236037,China)
出处
《网络新媒体技术》
2023年第5期45-50,共6页
Network New Media Technology
基金
阜阳师范大学产学研合作项目(编号:HX2022071000)。
关键词
语音增强
神经网络
梅尔尺度
相位谱补偿
模型复杂度
speech enhancement
neural network
mel-scale
phase spectrum compensation
model complexity