期刊文献+

基于RefineNet的端到端语音增强方法 被引量:3

RefineNet-based End-to-end Speech Enhancement
下载PDF
导出
摘要 为提高神经网络对语音信号时域波形的直接处理能力,提出了一种基于RefineNet的端到端语音增强方法.本文构建了一个时频分析神经网络,模拟语音信号处理中的短时傅里叶变换,利用RefineNet网络学习含噪语音到纯净语音的特征映射.在模型训练阶段,用多目标联合优化的训练策略将语音增强的评价指标短时客观可懂度(Short-time objective intelligibility,STOI)与信源失真比(Source to distortion ratio,SDR)融入到训练的损失函数.在与具有代表性的传统方法和端到端的深度学习方法的对比实验中,本文提出的算法在客观评价指标上均取得了最好的增强效果,并且在未知噪声和低信噪比条件下表现出更好的抗噪性. In order to improve the direct processing ability of the neural network to the time domain waveform of speech signal,this paper proposes an end-to-end speech enhancement method based on RefineNet.To simulate the short-time Fourier transform,a time-frequency analysis neural network is used in speech signal processing and the RefineNet is used to learn the feature mapping of noisy speech to clean speech.The speech enhancement evaluation metric short-time objective intelligibility(STOI)and source to distortion ratio(SDR)are integrated into the training loss function in the model training phase by using the multi-objective joint optimization training strategy.Experiments show that the proposed method consistently outperforms conventional methods and end-to-end deep learning methods on objective evaluation metric and has better noise immunity under unseen noise and low SNR conditions than other methods.
作者 蓝天 彭川 李森 钱宇欣 陈聪 刘峤 LAN Tian;PENG Chuan;LI Sen;QIAN Yu-Xin;CHEN Cong;LIU Qiao(School of Information and Software Engineering,University of Electronic Science and Technology of China,Chengdu 610054;CETC Big Data Research Institute Co.,Ltd.,Guiyang 550008)
出处 《自动化学报》 EI CAS CSCD 北大核心 2022年第2期554-563,共10页 Acta Automatica Sinica
基金 国家自然科学基金(U19B2028,61772117) 科技委创新特区项目(19-163-21-TS-001-042-01) 提升政府治理能力大数据应用技术国家工程实验室重点项目(10-2018039) 四川省科技服务业示范项目(2018GFW0150) 中央高校基本科研业务费项目(ZYGX2019 J077)资助。
关键词 语音增强 端到端 RefineNet 多目标联合优化 深度神经网络 Speech enhancement end-to-end RefineNet multi-objective joint optimization deep neural network
  • 相关文献

参考文献2

二级参考文献66

  • 1Kim G, Lu Y, Hu Y, Loizou P C. An algorithm that im- proves speech intelligibility in noise for normal-hearing lis- teners. The Journal of the Acoustical Society of America, 2009, 126(3): 1486-1494. 被引量:1
  • 2Dillon H. Hearing Aids. New York: Thieme, 2001. 被引量:1
  • 3Allen J B. Articulation and intelligibility. Synthesis Lectures on Speech and Audio Processing, 2005, 1(1): 1-124. 被引量:1
  • 4Seltzer M L, Raj B, Stern R M. A Bayesian classifier for spectrographic mask estimation for missing feature speech recognition. Speech Communication, 2004, 43(4): 379-393. 被引量:1
  • 5Weninger F, Erdogan H, Watanabe S, Vincent E, Le Roux J, Hershey J R, Schuller B. Speech enhancement with LSTM recurrent neural networks and its application to noise-robust ASR. In: Proceedings of the 12th International Conference on Latent Variable Analysis and Signal Separation. Liberec, Czech Republic: Springer International Publishing, 2015.91 -99. 被引量:1
  • 6Weng C, Yu D, Seltzer M L, Droppo J. Deep neural networks for single-channel multi-talker speech recognition. IEEE/ ACM Transactions on Audio, Speech, and Language Pro- cessing, 2015, 23(10): 1670-1679. 被引量:1
  • 7Boll S F. Suppression of acoustic noise in speech using spec- tral subtraction. IEEE Transactions on Acoustics, Speech, and Signal Processing, 1979, 27(2): 113-120. 被引量:1
  • 8Chen J D, Benesty J, Huang Y T, Doclo S. New insights into the noise reduction wiener filter. IEEE Transactions on Audio, Speech, and Language Processing, 2006, 14(4): 1218 -1234. 被引量:1
  • 9Loizou P C. Speech Enhancement: Theory and Practice. New York: CRC Press, 2007. 被引量:1
  • 10Liang S, Liu W J, Jiang W. A new Bayesian method incor- porating with local correlation for IBM estimation. IEEE Transactions on Audio, Speech, and Language Processing, 2013, 21(3): 476-487. 被引量:1

共引文献117

同被引文献14

引证文献3

二级引证文献5

相关作者

内容加载中请稍等...

相关机构

内容加载中请稍等...

相关主题

内容加载中请稍等...

浏览历史

内容加载中请稍等...
;
使用帮助 返回顶部