Acoustic source localization(ASL)and sound event detection(SED)are two widely pursued independent research fields.In recent years,in order to achieve a more complete spatial and temporal representation of sound field,...Acoustic source localization(ASL)and sound event detection(SED)are two widely pursued independent research fields.In recent years,in order to achieve a more complete spatial and temporal representation of sound field,sound event localization and detection(SELD)has become a very active research topic.This paper presents a deep learning-based multioverlapping sound event localization and detection algorithm in three-dimensional space.Log-Mel spectrum and generalized cross-correlation spectrum are joined together in channel dimension as input features.These features are classified and regressed in parallel after training by a neural network to obtain sound recognition and localization results respectively.The channel attention mechanism is also introduced in the network to selectively enhance the features containing essential information and suppress the useless features.Finally,a thourough comparison confirms the efficiency and effectiveness of the proposed SELD algorithm.Field experiments show that the proposed algorithm is robust to reverberation and environment and can achieve higher recognition and localization accuracy compared with the baseline method.展开更多
目的双目视差估计可以实现稠密的深度估计,因而具有重要研究价值。而视差估计和光流估计两个任务之间具有相似性,在两种任务之间可以互相借鉴并启迪新算法。受光流估计高效算法RAFT(recurrent all-pairs field transforms)的启发,本文...目的双目视差估计可以实现稠密的深度估计,因而具有重要研究价值。而视差估计和光流估计两个任务之间具有相似性,在两种任务之间可以互相借鉴并启迪新算法。受光流估计高效算法RAFT(recurrent all-pairs field transforms)的启发,本文提出采用单、双边多尺度相似性迭代查找的方法实现高精度的双目视差估计。针对方法在不同区域估计精度和置信度不一致的问题,提出了左右图像视差估计一致性检测提取可靠估计区域的方法。方法采用金字塔池化模块、跳层连接和残差结构的特征网络提取具有强表征能力的表示向量,采用向量内积表示像素间的相似性,通过平均池化得到多尺度的相似量,第0次迭代集成初始视差量,根据初始视差单方向向左查找多尺度的相似性得到的大视野相似量和上下文3种信息,而其他次迭代集成更新的视差估计量,根据估计视差双向查找多尺度的相似性得到的大视野相似量和上下文3种信息,集成信息通过第0次更新的卷积循环神经网络和其他次更新共享的卷积循环神经网络迭代输出视差的更新量,多次迭代得到最终的视差估计值。之后,通过对输入左、右图像反序和左右翻转估计右图视差,对比左、右图匹配点视差差值的绝对值和给定阈值之差判断视差估计置信度,从而实现可靠区域提取。结果本文方法在Sceneflow数据集上得到了与先进方法相当的精度,平均误差只有0.84像素,并且推理时间有相对优势,可以和精度之间通过控制迭代次数灵活平衡。可靠区域提取后,Sceneflow数据集上误差进一步减小到了历史最佳值0.21像素,在KITTI(Karlsruhe Institute of Technology and Toyota Technological Institute at Chicago)双目测试数据集上,估计区域评估指标最优。结论本文方法对于双目视差估计具有优越性能,可靠区域提取方法能高效提取高精度估计区域,极大地提升了�展开更多
基金supported by the National Natural Science Foundation of China(61877067)the Foundation of Science and Technology on Near-Surface Detection Laboratory(TCGZ2019A002,TCGZ2021C003,6142414200511)the Natural Science Basic Research Program of Shaanxi(2021JZ-19)。
文摘Acoustic source localization(ASL)and sound event detection(SED)are two widely pursued independent research fields.In recent years,in order to achieve a more complete spatial and temporal representation of sound field,sound event localization and detection(SELD)has become a very active research topic.This paper presents a deep learning-based multioverlapping sound event localization and detection algorithm in three-dimensional space.Log-Mel spectrum and generalized cross-correlation spectrum are joined together in channel dimension as input features.These features are classified and regressed in parallel after training by a neural network to obtain sound recognition and localization results respectively.The channel attention mechanism is also introduced in the network to selectively enhance the features containing essential information and suppress the useless features.Finally,a thourough comparison confirms the efficiency and effectiveness of the proposed SELD algorithm.Field experiments show that the proposed algorithm is robust to reverberation and environment and can achieve higher recognition and localization accuracy compared with the baseline method.
文摘目的双目视差估计可以实现稠密的深度估计,因而具有重要研究价值。而视差估计和光流估计两个任务之间具有相似性,在两种任务之间可以互相借鉴并启迪新算法。受光流估计高效算法RAFT(recurrent all-pairs field transforms)的启发,本文提出采用单、双边多尺度相似性迭代查找的方法实现高精度的双目视差估计。针对方法在不同区域估计精度和置信度不一致的问题,提出了左右图像视差估计一致性检测提取可靠估计区域的方法。方法采用金字塔池化模块、跳层连接和残差结构的特征网络提取具有强表征能力的表示向量,采用向量内积表示像素间的相似性,通过平均池化得到多尺度的相似量,第0次迭代集成初始视差量,根据初始视差单方向向左查找多尺度的相似性得到的大视野相似量和上下文3种信息,而其他次迭代集成更新的视差估计量,根据估计视差双向查找多尺度的相似性得到的大视野相似量和上下文3种信息,集成信息通过第0次更新的卷积循环神经网络和其他次更新共享的卷积循环神经网络迭代输出视差的更新量,多次迭代得到最终的视差估计值。之后,通过对输入左、右图像反序和左右翻转估计右图视差,对比左、右图匹配点视差差值的绝对值和给定阈值之差判断视差估计置信度,从而实现可靠区域提取。结果本文方法在Sceneflow数据集上得到了与先进方法相当的精度,平均误差只有0.84像素,并且推理时间有相对优势,可以和精度之间通过控制迭代次数灵活平衡。可靠区域提取后,Sceneflow数据集上误差进一步减小到了历史最佳值0.21像素,在KITTI(Karlsruhe Institute of Technology and Toyota Technological Institute at Chicago)双目测试数据集上,估计区域评估指标最优。结论本文方法对于双目视差估计具有优越性能,可靠区域提取方法能高效提取高精度估计区域,极大地提升了�