摘要
提出了一种基于U-NET3D的机器人歌声分离方法.为了降低计算复杂度,仅在U-NET3D的第1层使用3维卷积神经网络,从输入的多声道音频中学习不同声源距离产生的幅度和相位特征.利用NAO机器人录制了具有多声源的4声道乐声混合音频数据集,录制的音乐和歌声源自iKala数据集.利用最小欧几里德距离对混音信号、伴奏和歌声进行序列匹配后合成6声道声音数据.实验结果表明,本文所提方法在噪声环境下具有良好的分离效果,与U-NET相比能更好地分离出目标歌声.
A novel method of singing voice separation with U-NET3D for the robot was proposed.In order to reduce the computation,the 3D convolution neural network was used in the first layer to learn the amplitude and phase characteristics caused by the different distance of source from microphones.The NAO robot was used to record a 4-channel mixed audio dataset with multiple sound sources.The recorded music and singing voice were derived from the iKala dataset.The minimum Euclidean distance was used to match the mixing signal,accompaniment and singing sequence to synthesize 6-channel sound data.The experiment results showed that the proposed method has good separation performance in noisy environment,and can better separate target singing voice than U-NET.
作者
王大东
胡希颖
王晓宇
WANG Da-dong;HU Xi-ying;WANG Xiao-yu(College of Computer,Jilin Normal University,Siping 136000,China)
出处
《吉林师范大学学报(自然科学版)》
2021年第1期111-116,共6页
Journal of Jilin Normal University:Natural Science Edition
基金
吉林省教育厅“十三五”科学技术规划项目(JJKH20180763K)。