摘要
本文主要研究藏语语音去噪算法,提出一种基于频谱映射的卷积长短期记忆藏语语音去噪算法。该算法由数据准备模块、特征提取模块、网络模块以及音频还原模块4个模块组成,以纯净的拉萨语语音和加了噪声库NOISE-92六种单一噪声的带噪语音作为数据集,提取带噪语音和纯净语音的对数功率谱特征作为输入对网络进行训练,网络的效果通过可感知语音质量和短时客观可懂度两个指标进行评价。实验结果表明,该算法在非平稳噪声上的去噪效果优于平稳噪声,且信噪比越大其去噪效果越好;在低信噪比下,该算法在非平稳噪声上的表现优于谱减法和最小均方误差法。
The main research content of this paper is Tibetan speech denoising algorithm. This paper proposes a convolutional long short-term memory Tibetan speech denoising algorithm based on spectral mapping. The algorithm consists of four modules: data preparation module, feature extraction module, network module, and audio restoration module. The pure Lhasa language speech and the noisy speech with six single noises added in the noise library NOISE-92 are used as the data set. The logarithmic power spectrum features of noisy speech and pure speech are used as input to train the network. The effect of the network is determined by perceptual estimation of speech quality and short-time objective intelligibility are evaluated by two indicators. The experimental results show that,,the denoising effect of the algorithm on non-stationary noise is better than that of stationary noise, and the greater the signal-to-noise ratio, the better the de-noising effect;at low signal-to-noise ratio, the algorithm is in it outperforms spectral subtraction and least mean square error methods on non-stationary noise.
作者
王君堡
王希
边巴旺堆
WANG Junbao;WANG Xi;BIANBA Wangdui(School of Information Science and Technology,Lhasa 850000,China;National Experimental Teaching Demonstration Center of Information Technology,Lhasa 850000,China)
出处
《电声技术》
2022年第6期47-53,共7页
Audio Engineering
关键词
藏语去噪
对数功率谱
卷积长短期记忆网络
Tibetan language denoising
logarithmic power spectrum
convolutional long short-term memory network