期刊文献+

基于真实数据感知的模型功能窃取攻击

Model functionality stealing attacks based on real data awareness
原文传递
导出
摘要 目的模型功能窃取攻击是人工智能安全领域的核心问题之一,目的是利用有限的与目标模型有关的信息训练出性能接近的克隆模型,从而实现模型的功能窃取。针对此类问题,一类经典的工作是基于生成模型的方法,这类方法利用生成器生成的图像作为查询数据,在同一查询数据下对两个模型预测结果的一致性进行约束,从而进行模型学习。然而此类方法生成器生成的数据常常是人眼不可辨识的图像,不含有任何语义信息,导致目标模型的输出缺乏有效指导性。针对上述问题,提出一种新的模型窃取攻击方法,实现对图像分类器的有效功能窃取。方法借助真实的图像数据,利用生成对抗网络(generative adversarial net,GAN)使生成器生成的数据接近真实图像,加强目标模型输出的物理意义。同时,为了提高克隆模型的性能,基于对比学习的思想,提出一种新的损失函数进行网络优化学习。结果在两个公开数据集CIFAR-10(Canadian Institute for Advanced Research-10)和SVHN(street view house numbers)的实验结果表明,本文方法能够取得良好的功能窃取效果。在CIFAR-10数据集上,相比目前较先进的方法,本文方法的窃取精度提高了5%。同时,在相同的查询代价下,本文方法能够取得更好的窃取效果,有效降低了查询目标模型的成本。结论本文提出的模型窃取攻击方法,从数据真实性的角度出发,有效提高了针对图像分类器的模型功能窃取攻击效果,在一定程度上降低了查询目标模型代价。 Objective Current model stealing attack issue is a sub-field in artificial intelligence(AI)security.It tends to steal privacy information of the target model including its structures,parameters and functionality.Our research is focused on the model functionality stealing attacks.We target a deep learning based multi-classifier model and train a clone model to replicate the functionality of the black-box target classifier.Currently,most of stealing-functionality-attacks are oriented on querying data.These methods replicate the black-box target classifier by analyzing the querying data and the response from the target model.The kind of attacks based on generative models is popular and these methods have obtained promising results in functionality stealing.However,there are two main challenges to be faced as mentioned below:first,target image classifiers are trained on real images in common.Since these methods do not use ground truth data to supervise the training phase of generative models,the generated images are distorted to noise images rather than real images.In other words,the image data used by these methods is with few sematic information,leading to that the prediction of target model is with few effective guidance for the training of the clone model.Such images restrict the effect of training the clone model.Second,to train the generative model,it is necessary to initiate multiple queries to the target classifier.A severe burden is bear on query budgets.Since the target model is a black-box model,we need to use its approximated gradient to obtain generator via zero-gradient estimation.Hence,the generator cannot obtain accurate gradient information for updating itself.Method We try to utilize the generative adversarial nets(GAN)and the contrastive learning to steal target classifier functionality.The key aspect of our research is on the basis of the GAN-based prior information extraction of ground truth images on public datasets,aiming to make the prediction from the target classifier model be with effective
作者 李延铭 李长升 余佳奇 袁野 王国仁 Li Yanming;Li Changsheng;Yu Jiaqi;Yuan Ye;Wang Guoren(School of Computer Science,Beijing Institute of Technology,Beijing 100081,China)
出处 《中国图象图形学报》 CSCD 北大核心 2022年第9期2721-2732,共12页 Journal of Image and Graphics
关键词 模型功能窃取 生成模型 对比学习 对抗攻击 人工智能安全 model functionality stealing generative model contrastive learning adversarial attack artificial intelligencesecurity
  • 相关文献

参考文献2

二级参考文献18

共引文献27

相关作者

内容加载中请稍等...

相关机构

内容加载中请稍等...

相关主题

内容加载中请稍等...

浏览历史

内容加载中请稍等...
;
使用帮助 返回顶部