基于知识蒸馏的差分隐私联邦学习方法

Differential Privacy Federated Learning Method Based on Knowledge Distillation

下载PDF

导出

摘要差分隐私技术作为一种隐私保护方法,在联邦学习领域得到了广泛应用。现有的差分隐私应用于联邦学习的研究,或是未考虑无标签公共数据,或是未考虑客户端之间的数据量差异,限制了其在现实场景的应用。文中提出一种基于知识蒸馏的差分隐私联邦学习方法,引入无标签公共数据集并考虑到客户端之间数据量的差异,为此场景设计了专用的差分隐私方案。首先,按数据量大小将客户端分组为“大数据量客户端”和“一般客户端”,用大数据量客户端的数据训练教师模型,教师模型为公共数据集添加伪标签,然后,公共数据集作为“特殊客户端”与“一般客户端”共同进行联邦训练。采用差分隐私技术保证客户端的数据隐私,由于特殊客户端的数据只有标签涉及隐私,在联邦训练中为其分配比一般客户端更多的隐私预算;限制隐私预算总量,设联邦训练阶段的隐私预算为定值,根据客户端对隐私性的需求和隐私预算平行组合性质,调整伪标签添加阶段的隐私预算。在MNIST数据集和SVHN数据集上的实验表明,在同等的隐私预算消耗下,训练得到了精度比传统方法更高的模型。本方案具有可拓展性,高灵活度的隐私预算分配使其可以满足复杂的隐私需求。 Differential privacy technology,as a privacy protection method,has been widely applied in federated learning.The existing research on the application of differential privacy in federated learning either fails to consider unlabeled public data or the difference in data volume between clients,which limits its application in real-world scenarios.This paper proposes a differential privacy federated learning method based on knowledge distillation,which introduces unlabeled public datasets and considers the differences in data volume between clients.A dedicated differential privacy scheme is designed for this scenario.Firstly,the clients are grouped into“large data clients”and“general clients”based on the size of the data.The teacher model is trained using the data from the large data clients,and the teacher model adds pseudo labels to the public dataset.Then,the public dataset is used as a“special client”to jointly conduct federated training with the“general client”.Adopting differential privacy technology to ensure the data privacy of clients,as the data of special clients only involves privacy with labels,more privacy budgets are allocated to them in federated training compared to general clients.Limit the total amount of privacy budget,set the privacy budget for the federal training stage as a fixed value,and adjust the privacy budget for the pseudo label addition stage based on the client’s privacy needs and the parallel combination property of privacy budget.Experiments on the MNIST and SVHN datasets show that,under the same privacy budget consumption,the trained model has higher accuracy than traditional methods.This scheme has scalability,and its high flexibility of privacy budget allocation enables it to meet complex privacy needs.

作者谭智文徐茹枝王乃玉罗丹 TAN Zhiwen;XU Ruzhi;WANG Naiyu;LUO Dan(School of Control and Computer Engineering,North China Electric Power University,Beijing 102206,China)

机构地区华北电力大学控制与计算机工程学院

出处《计算机科学》 CSCD 北大核心 2024年第S01期906-913,共8页 Computer Science

基金国家自然科学基金(61972148)。

关键词联邦学习差分隐私知识蒸馏隐私保护隐私预算 Federated learning Differential privacy Knowledge distillation Privacy protection Privacy budget

分类号 TP309.2 [自动化与计算机技术—计算机系统结构]

引文网络
相关文献

1胡莹莹.r-Lah数矩阵的全正性研究[J].数学的实践与认识,2024,54(3):237-240.
2陈涛,谢在鹏,屈志昊.基于动态阈值增强原型网络的联邦半监督学习模型[J].智能系统学报,2024,19(3):534-545.
3高速.可数交换群作用的描述组合学[J].中国科学：数学,2024,54(4):575-592.

计算机科学

2024年第S01期

浏览历史

内容加载中请稍等...

基于知识蒸馏的差分隐私联邦学习方法

相关作者

相关机构

相关主题

浏览历史