期刊文献+

基于频谱图转换器的音频场景分类 被引量:3

Audio Scene Classification Based on Audio Spectrogram Transformer
下载PDF
导出
摘要 音频场景分类是场景理解重要的一环,学习音频场景特征并精准分类能加强机器与环境的交互能力,在大数据时代其重要性不言而喻。鉴于分类任务表现依赖数据集规模,但实际任务中又面临数据集严重不足的情况,本文提出了数据增强和网络模型预训练策略,将频谱图转换器模型和音频场景分类任务相结合。首先,提取音频信号对数梅尔能量频谱图输入模型,然后通过模型动态交互能力,加强音频序列空间关系,最后由标记向量完成分类。将本文方法在DCASE2019task1和DCASE2020task1公开数据集上进行测试,分类准确率分别达到了96.489%和93.227%,与已有算法相比有明显的提升,说明本方法适用高精度音频场景分类任务,为高精度智能设备感知环境内容、检测环境动态打下了基础。 Audio scene classification was an important part of scene understanding.Learning the characteristics of audio scenes and accurate classification can strengthen the interaction between machines and the environment,and its importance is self-evident in the age of big data.In view of the fact that the performance of classification task depends on the size of the dataset,but the actual task is faced with a serious shortage of data sets,this paper proposed a data enhancement and network model pre-training strategy,which combined the audio spectrogram transformer model with the audio scene classification task.First,extracted the input model of the log-Mel energies spectrum of the audio signal,then strengthened the spatial relationship of the audio sequence through the dynamic interaction ability of the model,and finally complete the classification by the tag vector.The method in this paper is tested on the public datasets of DCASE2019task1 and DCASE2020task1,and the classification accuracy rates are 96.489%and 93.227%respectively,which is significantly improved compared with the existing algorithms,indicating that this method is applicable to high-precision audio scene classification tasks,laying a foundation for high-precision intelligent devices to perceive environmental content and detect environmental dynamics.
作者 袁双 杨立东 郭勇 牛大伟 张丹丹 YUAN Shuang;YANG Lidong;GUO Yong;NIU Dawei;ZHANG Dandan(Inner Mongolia University of Science and Technology,School of Information Engineering,Baotou,Inner Mongolia 014010,China)
出处 《信号处理》 CSCD 北大核心 2023年第4期730-736,共7页 Journal of Signal Processing
基金 国家自然科学基金项目(62161040) 内蒙古科技计划项目(2021GG0023) 内蒙古自然科学基金项目(2021MS06030) 内蒙古自治区高等学校青年科技英才支持计划(NJYT22056)资助。
关键词 音频场景分类 转换器 预训练 数据增强 audio scene classification transformer pre-trained data enhancement
  • 相关文献

参考文献3

二级参考文献10

共引文献1716

同被引文献30

引证文献3

二级引证文献3

相关作者

内容加载中请稍等...

相关机构

内容加载中请稍等...

相关主题

内容加载中请稍等...

浏览历史

内容加载中请稍等...
;
使用帮助 返回顶部