摘要
如今,深度神经网络在各个领域取得了广泛的应用.然而研究表明,深度神经网络容易受到对抗样本的攻击,严重威胁着深度神经网络的应用和发展.现有的对抗防御方法大多需要以牺牲部分原始分类精度为代价,且强依赖于已有生成的对抗样本所提供的信息,无法兼顾防御的效力与效率.因此基于流形学习,从特征空间的角度提出可攻击空间对抗样本成因假设,并据此提出一种陷阱式集成对抗防御网络Trap-Net. Trap-Net在原始模型的基础上向训练数据添加陷阱类数据,使用陷阱式平滑损失函数建立目标数据类别与陷阱数据类别间的诱导关系以生成陷阱式网络.针对原始分类精度损失问题,利用集成学习的方式集成多个陷阱式网络以在不损失原始分类精度的同时,扩大陷阱类标签于特征空间所定义的靶标可攻击空间.最终, Trap-Net通过探测输入数据是否命中靶标可攻击空间以判断数据是否为对抗样本.基于MNIST、K-MNIST、F-MNIST、CIFAR-10和CIFAR-100数据集的实验表明, Trap-Net可在不损失干净样本分类精确度的同时具有很强的对抗样本防御泛化性,且实验结果验证可攻击空间对抗成因假设.在低扰动的白盒攻击场景中, Trap-Net对对抗样本的探测率高达85%以上.在高扰动的白盒攻击和黑盒攻击场景中, Trap-Net对对抗样本的探测率几乎高达100%.与其他探测式对抗防御方法相比, Trap-Net对白盒和黑盒对抗攻击皆有很强的防御效力.为对抗环境下深度神经网络提供一种高效的鲁棒性优化方法.
Nowadays,deep neural networks(DNNs)have been widely used in various fields.However,research has shown that DNNs are vulnerable to attacks of adversarial examples(AEs),which seriously threaten the development and application of DNNs.Most of the existing adversarial defense methods need to sacrifice part of the original classification accuracy to obtain defense capability and strongly rely on the knowledge provided by the generated AEs,so they cannot balance the effectiveness and efficiency of defense.Therefore,based on manifold learning,this study proposes an origin hypothesis of AEs in attackable space from the feature space perspective and a trap-type ensemble adversarial defense network(Trap-Net).Trap-Net adds trap data to the training data based on the original model and uses the trap-type smoothing loss function to establish the seducing relationship between the target data and trap data,so as to generate trap-type networks.In order to address the problem that most adversarial defense methods sacrifice original classification accuracy,ensemble learning is used to ensemble multiple trap networks,so as to expand attackable target space defined by trap labels in the feature space and reduce the loss of the original classification accuracy.Finally,Trap-Net determines whether the input data are AEs by detecting whether the data hit the attackable target space.Experiments on MNIST,K-MNIST,F-MNIST,CIFAR-10,and CIFAR-100 datasets show that Trap-Net has strong defense generalization of AEs without sacrificing the classification accuracy of clean samples,and the results of experiments validate the adversarial origin hypothesis in attackable space.In the low-perturbation white-box attack scenario,Trap-Net achieves a detection rate of more than 85%for AEs.In the high-perturbation white-box attack and black-box attack scenarios,Trap-Net has a detection rate of almost 100%for AEs.Compared with other detection methods of AEs,Trap-Net is highly effective against white-box and black-box adversarial attacks,and it provides an
作者
孙家泽
温苏雷
郑炜
陈翔
SUN Jia-Ze;WEN Su-Lei;ZHENG Wei;CHEN Xiang(School of Computer Science and Technology,Xi’an University of Posts and Telecommunications,Xi’an 710121,China;Shaanxi Key Laboratory of Network Data Analysis and Intelligent Processing(Xi’an University of Posts and Telecommunications),Xi’an 710121,China;Xi’an Key Laboratory of Big Data and Intelligent Computing(Xi’an University of Posts and Telecommunications),Xi’an 710121,China;School of Software,Northwestern Polytechnical University,Xi’an 710072,China;School of Information Science and Technology,Nantong University,Nantong 226019,China)
出处
《软件学报》
EI
CSCD
北大核心
2024年第4期1861-1884,共24页
Journal of Software
基金
国家自然科学基金(61876138,62272387,62141208)
国家重点研发计划(2020YFC0833105Z1)
西安市重点产业链人工智能核心技术攻关项目(2022JH-RGZN-0028)
陕西省重点研发计划(2023-YBGY-030)
西安邮电大学创新基金(CXJJZL2021007)。
关键词
深度神经网络
对抗样本
集成学习
对抗防御
鲁棒性优化
deep neural network(DNN)
adversarial example
ensemble learning
adversarial defense
robustness optimization