摘要
差分隐私技术作为一种隐私保护方法,在联邦学习领域得到了广泛应用。现有的差分隐私应用于联邦学习的研究,或是未考虑无标签公共数据,或是未考虑客户端之间的数据量差异,限制了其在现实场景的应用。文中提出一种基于知识蒸馏的差分隐私联邦学习方法,引入无标签公共数据集并考虑到客户端之间数据量的差异,为此场景设计了专用的差分隐私方案。首先,按数据量大小将客户端分组为“大数据量客户端”和“一般客户端”,用大数据量客户端的数据训练教师模型,教师模型为公共数据集添加伪标签,然后,公共数据集作为“特殊客户端”与“一般客户端”共同进行联邦训练。采用差分隐私技术保证客户端的数据隐私,由于特殊客户端的数据只有标签涉及隐私,在联邦训练中为其分配比一般客户端更多的隐私预算;限制隐私预算总量,设联邦训练阶段的隐私预算为定值,根据客户端对隐私性的需求和隐私预算平行组合性质,调整伪标签添加阶段的隐私预算。在MNIST数据集和SVHN数据集上的实验表明,在同等的隐私预算消耗下,训练得到了精度比传统方法更高的模型。本方案具有可拓展性,高灵活度的隐私预算分配使其可以满足复杂的隐私需求。
Differential privacy technology,as a privacy protection method,has been widely applied in federated learning.The existing research on the application of differential privacy in federated learning either fails to consider unlabeled public data or the difference in data volume between clients,which limits its application in real-world scenarios.This paper proposes a differential privacy federated learning method based on knowledge distillation,which introduces unlabeled public datasets and considers the differences in data volume between clients.A dedicated differential privacy scheme is designed for this scenario.Firstly,the clients are grouped into“large data clients”and“general clients”based on the size of the data.The teacher model is trained using the data from the large data clients,and the teacher model adds pseudo labels to the public dataset.Then,the public dataset is used as a“special client”to jointly conduct federated training with the“general client”.Adopting differential privacy technology to ensure the data privacy of clients,as the data of special clients only involves privacy with labels,more privacy budgets are allocated to them in federated training compared to general clients.Limit the total amount of privacy budget,set the privacy budget for the federal training stage as a fixed value,and adjust the privacy budget for the pseudo label addition stage based on the client’s privacy needs and the parallel combination property of privacy budget.Experiments on the MNIST and SVHN datasets show that,under the same privacy budget consumption,the trained model has higher accuracy than traditional methods.This scheme has scalability,and its high flexibility of privacy budget allocation enables it to meet complex privacy needs.
作者
谭智文
徐茹枝
王乃玉
罗丹
TAN Zhiwen;XU Ruzhi;WANG Naiyu;LUO Dan(School of Control and Computer Engineering,North China Electric Power University,Beijing 102206,China)
出处
《计算机科学》
CSCD
北大核心
2024年第S01期906-913,共8页
Computer Science
基金
国家自然科学基金(61972148)。
关键词
联邦学习
差分隐私
知识蒸馏
隐私保护
隐私预算
Federated learning
Differential privacy
Knowledge distillation
Privacy protection
Privacy budget