摘要
多基频估计被广泛应用于音乐结构分析、乐音辅助教育、信息检索等各个领域。为了满足准确识别乐曲中随机和弦的需求,提出了基于生成对抗网络去影像的多基频估计算法。首先将完整音频切分成音符段,提出了一种谐音指纹图提取音符段频谱特征;然后通过卷积神经网络识别谐音指纹图当前的主导基频,将已识别出的主导基频作为干扰下一个基频识别的影像,并通过生成对抗网络去除干扰影像,对已去除干扰影像后的谐音指纹图进行新一轮的多基频估计;最后通过逐级迭代去影像操作实现完整和弦的多基频估计。对随机二音和弦及随机三音和弦组成的钢琴音频数据库进行实验,结果表明,所提算法与经典频谱迭代删除算法和大型词袋和弦识别算法相比,能够适应随机和弦的识别,在不同的音域范围内鲁棒性高,整体正确率有明显提升。
Multiple fundamental frequency estimation is widely used in music structure analysis,music aided education,information retrieval and other fields.In order to meet the requirements of accurate identification of random chords in music,a multiple fundamental frequency estimation algorithm based on generative adversarial networks is proposed.Firstly,the complete audio is divided into note segments,and a homophonic fingerprint is proposed to extract the spectrum characteristics of the note segment.Then,the current dominant fundamental frequency of the homophonic fingerprint is identified by convolution neural network,and the identified dominant fundamental frequency is considered as the image that interferes with the next fundamental frequency recognition.Then,the interference image is removed by generative adversarial networks,and the homophonic fingerprint image affected by interference is processed in a new round.Finally,the multiple fundamental frequency estimation of complete chords is realized by iterative de imaging operation step by step.Experiments on the piano audio database composed of random two tone chord and random three tone chord are carried out.The results show that,compared with the classical spectrum iterative deletion algorithm and the large vocabulary chord recognition algorithm,the algorithm in this paper can adapt to the recognition of random chords,has high robustness in different ranges,and improves the overall accuracy significantly.
作者
黎思泉
万永菁
蒋翠玲
LI Si-quan;WAN Yong-jing;JIANG Cui-ling(Department of Information Science and Engineering,East China University of Science and Technology,Shanghai 200000,China)
出处
《计算机科学》
CSCD
北大核心
2022年第3期179-184,共6页
Computer Science
关键词
多基频估计
谐音指纹图
生成对抗网络
卷积神经网络
基频影像
Multiple fundamental frequency estimation
Homophonic fingerprint
Generative adversarial networks
Convolution neural network
Fundamental frequency image