基于深度学习的混响感知麦克风阵列语音增强

Reverberation-aware microphone array speech-enhancement algorithmbased on deep-learning

下载PDF

导出

摘要 [目的]针对基于深度神经网络频谱估计的麦克风阵列算法存在数据依赖的问题,提出了一种基于深度学习的混响感知麦克风阵列语音增强算法.[方法]首先利用麦克风阵列波束形成输出与原始信号做互相关,以近似房间冲激响应的形式获取当前环境的混响特性作为LSTM网络的输入,网络模型以干净语音为目标进行训练从而输出房间冲激响应泛化向量,最后通过组合近似房间冲激响应与房间冲激响应泛化向量获得后置抗混响滤波器系数,实现语音增强.[结果]仿真和实验结果中,与波束形成、加权预测误差算法和传统深度学习去混响算法相比,所提出的方法在不同混响场景下具有更好的表现.[结论]本文方法在不同混响场景下都具有相对稳定的抗混响能力,具有较好的泛化性能. [Objective]The technique of microphone array has been extensively applied for enhancing speech by means of the exploration of spatial information provided by multiple microphone channel.However,due to diverse reverberation characteristics produced by different sizes,different boundary materials and different reflectors,the speech enhance performance of microphone array are deteriorated significantly.In recent years,the deep-learning optimized microphone array signal processing has been investigated to remedy the problem caused by reverberation,which endures the data dependence and thus cannot adapt to the reverberation scene that is excluded from the training data.In this paper,a novel reverberation-aware(RA)microphone array speech enhancement algorithm is proposed to first obtain the reverberant feature and then design a deep-learning model to decouple the negative impact of environments,thus facilitating environment adaptive microphone array speech enhancement under diverse reverberant scenarios.[Methods]The proposed RA microphone array speech enhancement algorithm consists of training stage and testing stage.Specifically,in the training stage,the simulated reverberant signal is used for obtaining approximate room impulse response(ARIR)by correlating the reverberant signal with its beamforming output.Then,with the clean speech as training target,a RA model is designed by adopting ARIR and the beamformed signal as the training input.Consequently,a diverse room impulse response(RIR)generalized vector(RGV)to generalize the de-reverberation model with respect to RIR as well as the uncontrolled speech can be produced.In the practical testing stage,the practical ARIR is similarly obtained by correlating the received reverberant signal with its beamforming output.Afterward the resulting RGV is used to convolve with the practical ARIR to obtain the coefficients of a post de-reverberation filter,which exerts to remove the reverberation corresponding to ARIR.[Results]Performance of the proposed RA speech enhancement algo

作者何伟刘雨佶童峰康元勋冯万健 HE Wei;LIU Yuji;TONG Feng;KANG Yuanxun;FENG Wanjian(College of Ocean and Earth Sciences,Xiamen University,Xiamen 361005,China;National and Local Joint Engineering Research Center for Navigation and Location Service Technology of Xiamen University,Xiamen 361005,China;Xiamen Yealink Network Technology Co.Ltd,Xiamen 361015,China)

机构地区厦门大学海洋与地球学院导航与位置服务技术国家地方联合工程研究中心(厦门大学) 厦门亿联网络技术股份有限公司

出处《厦门大学学报（自然科学版）》 CAS CSCD 北大核心 2024年第2期287-295,304,共10页 Journal of Xiamen University：Natural Science

基金上海市科委“科技创新行动计划”项目(21DZ1205502) 厦门市海洋产业项目(22CZB012HJ13)。

关键词混响麦克风阵列波束形成房间冲激响应深度学习长短时记忆 reverberation microphone array beamforming room impulse response deep learning LSTM

分类号 TN912.3 [电子电信—通信与信息系统]

引文网络
相关文献

参考文献2

1潘超,黄公平,陈景东.面向语音通信与交互的麦克风阵列波束形成方法[J].信号处理,2020,36(6):804-815. 被引量：21
2杨春庄,李文婷,刘海生.基于脉冲响应的测量混响时间虚拟仪器的研究[J].声学技术,2010,29(6):626-631. 被引量：3

二级参考文献12

1ISO3382. Acoustics measurement of the reverberation time of rooms with reference to other acoustical parameters[S]. 1997. 被引量：1
2GB/T 20247 2006/ISO 354:2003.声学混响室吸声测量[S]. 被引量：1
3GB/T3241-1998.倍频程和分数倍频程滤波器[S].1998. 被引量：2
4M. R Schroeder. New method of measuring reverberation time[J]. J. Acoust. Soc. Am, 1965, 37: 409-412. 被引量：1
5Vorlander M, Kob M. Practical aspects of MLS measurements in building acoustics[J]. Applied Acoustics, 1997, 52(3/4): 239-258. 被引量：1
6Svensson U P, J L Nielsen. Errors in MLS measurements caused by time variance in acoustic systems[J]. Audio Eng. Soc., 1999, 47(11): 907-927. 被引量：1
7Beranek L L, Hidaka T, Nishihara N. Relation of acoustical parameters with and without audiences in concert halls and a simple method for simulating the occupied stat[J]. Acoust. Soc. Am., 2001, 109(3): 1028-1042. 被引量：1
8Mateljan I. Signal selection for the room acoustics measurement[C]. Proc. 1999 IEEE Workshop on Application of Signal Processing to Audio and Acoustics, New York, 1999: 199-202. 被引量：1
9Mommertz E, S Muller. Measuting impulse responses with digi- tally pre-emphasized pseudorandom noise derived from maximum-length sequences[J]. Applied Acoustics, 1995, 44(3): 195-214. 被引量：1
10Farina A. Simultaneous measurement of impulse response and distortion with a swept-sine technique[J]. Audio. Eng. Soc, 2000, 45(4): 350. 被引量：1

共引文献22

1程果,徐荣武,何琳,孙红灵.混响声场条件下水声互易传递函数的测量及应用[J].声学学报,2014,39(5):577-581. 被引量：5
2CHENG Guo,XU Rongwu,HE Lin,SUN Hongling.Application and measurement of underwater acoustic reciprocity transfer functions in reverberant sound field[J].Chinese Journal of Acoustics,2014,33(4):369-378.
3程颖.基于智能语音交互的多媒体阅读辅助系统设计[J].自动化与仪器仪表,2022(2):112-115. 被引量：2
4崔智恒,焦继业,祝禛天.双麦克风语音增强算法研究与实现[J].电子设计工程,2022,30(10):109-114. 被引量：1
5蔡野锋,叶超,马登永,沐永生.基于误差分布的鲁棒性差分波束形成设计[J].网络新媒体技术,2022,11(3):38-44. 被引量：1
6孟维鑫,厉剑,郑成诗,李晓东.复广义高斯分布多通道最大似然联合去噪去混响波束形成器[J].信号处理,2022,38(4):677-689. 被引量：4
7阮国恒,钟业荣,江嘉铭.基于MFCC系数的语音交互系统设计[J].自动化与仪器仪表,2022(6):167-171. 被引量：2
8何娟,吕冠群.基于声音信号的翻译机器人语音交互系统设计[J].自动化与仪器仪表,2022(6):186-190. 被引量：2
9兰朝凤,刘岩,赵宏运,刘春东.基于波束形成的长短时记忆网络语音分离算法研究[J].电子与信息学报,2022,44(7):2531-2538. 被引量：3
10刘文俊,巩朋成,吴云韬.基于Kronecker积的差分波束形成[J].武汉工程大学学报,2022,44(6):690-694.

1郭薇,刘峰,贾耀君.基于升降HFM组合的高分辨抗混响波形设计[J].声学技术,2023,42(6):839-846. 被引量：1
2田昊洋,任茂鑫,曹培,徐鹏,张今朝.基于声成像技术的GIS击穿放电点定位方法[J].沈阳工业大学学报,2023,45(6):649-655. 被引量：1
3夏军,孙静,杨宏波,潘家华,郭涛,王威廉.基于多窗口时频重排的巴克频谱系数心音分类算法研究[J].生物医学工程学杂志,2024,41(1):51-59.
4许彦伟,薛勐,刘明刚,郝程鹏,赵莉,王佳欢,周正春.多无人水下航行器协同探测声呐宽带波形设计与性能分析[J].电子与信息学报,2023,45(10):3796-3804. 被引量：1
5张家扬,何伟,童峰,卢荣富,冯万健.基于角度压制比谱减的环境自适应双麦语音增强[J].厦门大学学报（自然科学版）,2024,63(2):296-304.
6张琴,曹一青.基于Zernike矩和拼贴误差的布料图案检索算法[J].贵州大学学报（自然科学版）,2024,41(2):53-59.

厦门大学学报（自然科学版）

2024年第2期

浏览历史

内容加载中请稍等...

基于深度学习的混响感知麦克风阵列语音增强

参考文献2

二级参考文献12

共引文献22

相关作者

相关机构

相关主题

浏览历史