时频分区扰动实现音频分类对抗样本生成

Adversarial Example Generation for Audio Classification Based on Time-Frequency Partitioned Perturbatio

下载PDF

导出

摘要现有方法生成的音频分类对抗样本(adversarial example, AE)攻击成功率低,易被感知。鉴于此,设计了一种基于时频分区扰动(time-frequency partitioned perturbation, TFPP)的音频AE生成框架。音频信号的幅度谱根据时频特性被划分为关键和非关键区域,并生成相应的对抗扰动。在TFPP基础上,提出了一种基于生成对抗网络(generative adversarial network, GAN)的AE生成方法TFPPGAN,以分区幅度谱为输入,通过对抗训练自适应调整扰动约束系数,同时优化关键和非关键区域的扰动。3个典型音频分类数据集上的实验表明,与基线方法相比,TFPPGAN可将AE的攻击成功率、信噪比分别提高4.7%和5.5 dB,将生成的语音对抗样本的质量感知评价得分提高0.15。此外,理论分析了TFPP框架与其他攻击方法相结合的可行性,并通过实验验证了这种结合的有效性。 The adversarial examples generated by the existing methods generally suffer from a low attack success rate and are easy to perceive.To address these problems,this paper first designs an audio adversarial example generation framework based on Time-Frequency Partitioned Perturbation(TFPP).Leveraging the time-spectral characteristics of the audio signal,the framework divides the magnitude spectrum of the input audio signal into critical regions and non-critical regions,and generates the corresponding perturbations.Building upon this framework,this paper further proposes a Generative Adversarial Network(GAN)-based adversarial example generation method named TFPPGAN.TFPPGAN takes magnitude spectra as inputs and uses adversarial training to simultaneously optimize the adversarial perturbations in critical and non-critical regions by adaptively adjusting the partitioned perturbation constraint coefficients.Exhaustive comparison experiments are conducted on three typical audio classification datasets.The experimental results show that,compared with baseline methods,TFPPGAN can improve the attack success rate and signal-to-noise ratio by 4.7%and 5.5 dB respectively.The perceptual evaluation score of generated adversarial speech quality also improves by 0.15.Besides,this paper theoretically analyzes the feasibility of the combination of TFPP with other attack methods,and experimentally verify the effectiveness of this combination.

作者张雄伟张强杨吉斌孙蒙李毅豪 ZHANG Xiongwei;ZHANG Qiang;YANG Jibin;SUN Meng;LI Yihao(College of Command&Control Engineering,Army Engineering University of PLA,Nanjing 210007,China)

机构地区陆军工程大学指挥控制工程学院

出处《陆军工程大学学报》 2024年第1期1-11,共11页 Journal of Army Engineering University of PLA

基金国家自然科学基金(62071484)。

关键词音频分类对抗样本生成对抗网络分区扰动 audio classification adversarial example generative adversarial network partitioned perturbation

分类号 TP389.1 [自动化与计算机技术—计算机系统结构]

引文网络
相关文献

参考文献2

1弓彦婷,程小雪,任洪梅,陈雁翔.声谱图显著性在音频识别中的应用[J].合肥工业大学学报（自然科学版）,2016,39(1):62-66. 被引量：4
2张雄伟,李毅豪,孙蒙,张强.单通道语音增强中深度学习方法研究现状与展望[J].陆军工程大学学报,2022,1(5):1-12. 被引量：6

二级参考文献21

1Lin C S,Wang D R.Spectrogram image encoding based on dynamic Hilbert curve routing[C]//International Confer- ence on Image Processing Theory Tools and Applications.IEEE,2010:107-111. 被引量：1
2Ke Y,Hoiem D,Sukthankar R.Computer vision for music i- dentification[C]//IEEE Computer Society Conference on Computer Vision and Pattern Recognition.IEEE,2005:597-604. 被引量：1
3Schutte K,Glass J.Speech recognition with localized time- frequency pattern detectors[C]//lEEE Workshop on Auto- matic Speech Recognition and Understanding.IEEE,2007:341-346. 被引量：1
4Koch C,Ullman S.Shifts in selective visual attention:to- wards the underlying neural circuitry[J],Human Neurobi- ology,1985,4(4):219-245. 被引量：1
5Itti L, Koch C,Niebur E.A model of saliency-based visual attention for rapid scene analysis[J].IEEE Transactions on Pattern Analysis and Machine Intelligence,1998,20(11):1254-1259. 被引量：1
6Dick M,Ullman S,Sagi D.Parallel and serial processes in motion detection.[J].Science,1987,237(4813):400-402. 被引量：1
7Achanta R,Hemami S,Estrada F,et al.Frequency-tuned salient region detection[C]//International Conference on Computer Vision and Pattern Recognition.IEEE,2009:1597-1604. 被引量：1
8Harel J,Koch C,Perona P.Graph-based visual saliency[J].Advances in Neural Information Processing Systems,2006,19:545-552. 被引量：1
9Bruce N D B,Tsotsos J K.Saliency based on information maximization[J].Advances in Neural Information Process- ing Systems,2005,18(3):298-308. 被引量：1
10Rahtu E,Kannala J,Salo M,et al.Segmenting salient ob- jects from images and videos[C]//Proceedings of the Ilth European Conference on Computer Vision:Part V.Spring- er-Verlag,2010:366-379. 被引量：1

共引文献8

1王森,王余,王易川,李海涛.水下高速目标声谱图特征提取及分类设计[J].电子与信息学报,2017,39(11):2684-2689. 被引量：6
2陈强普,桑军,项志立,罗红玲,郭沛,蔡斌.BN对VGG神经网络的影响研究[J].合肥工业大学学报（自然科学版）,2018,41(1):35-39. 被引量：12
3付炜,杨洋.基于卷积神经网络和随机森林的音频分类方法[J].计算机应用,2018,38(A02):58-62. 被引量：13
4潘丽莎.基于AI人工智能的学前教育机器人对话系统研究[J].自动化与仪器仪表,2023(5):245-248. 被引量：3
5郭一鸣.深度学习在射频干扰抑制中的应用研究[J].通信电源技术,2023,40(16):136-138.
6胡亚豪,陶蔚,谢艺菲,王田丰,潘志松.风格前缀引导下的无监督文本风格迁移[J].陆军工程大学学报,2023,2(6):31-38.
7王小莉.多语音和深度学习的对话机器人语音增强技术研究[J].自动化与仪器仪表,2023(12):173-177. 被引量：1
8张池,王忠,姜添豪,谢康民.基于并行多注意力的语音增强网络[J].计算机工程,2024,50(4):68-77.

1刘媖.以项目化学习为载体融“五育”于课堂教学实践——以“拯救社区小池塘”项目化学习案例为例[J].现代教学,2024(1):203-204.
2智志洋,李爱光,杜志刚,阮清林,赵翼鹏.点云密度特征约束下的隧道开挖轴线抗噪提取[J].测绘通报,2024(1):109-114.
3史欣妍,叶桦.元宇宙技术视域下在线口语教学发展探析[J].海外英语,2023(23):109-111.
4王宇唯,黄宏成.基于CARLA的仿真数据集生成框架研究[J].传动技术,2023,37(4):3-6.
5宋德庆.钢管混凝土柱内置拉筋结构性能研究[J].科学技术创新,2024(5):154-157.
6吴海勇,张明建,林娜,蔡聪艺,陈旸,康淑贤,谢信华.扫秒式和跳秒式石英钟机芯声发射时频信号分析[J].通化师范学院学报,2024,45(2):1-8.
7路旭,王梦云,王子祥.基于语义分析法的城市街景色彩感知评价与优化研究[J].华中建筑,2024,42(3):76-80.
8郭延龙,孙玉洁.基于AHP和熵值法China Daily头版封面插画视觉感知评价研究[J].北京印刷学院学报,2024,32(1):53-60.
9李飞,吴兴文,罗贇,刘开成,谢晨希.某地铁转向架构架端部开裂机理及分析[J].铁道机车车辆,2024,44(1):142-149.
10殷环环.书影同行——高校图书馆立体阅读推广探究[J].内蒙古科技与经济,2024(1):143-146.

陆军工程大学学报

2024年第1期

浏览历史

内容加载中请稍等...

时频分区扰动实现音频分类对抗样本生成

参考文献2

二级参考文献21

共引文献8

相关作者

相关机构

相关主题

浏览历史