Improving Parameter Estimation and Defensive Ability of Latent Dirichlet Allocation Model Training Under Rényi Differential Privacy

导出

摘要 Latent Dirichlet allocation(LDA)is a topic model widely used for discovering hidden semantics in massive text corpora.Collapsed Gibbs sampling(CGS),as a widely-used algorithm for learning the parameters of LDA,has the risk of privacy leakage.Specifically,word count statistics and updates of latent topics in CGS,which are essential for parameter estimation,could be employed by adversaries to conduct effective membership inference attacks(MIAs).Till now,there are two kinds of methods exploited in CGS to defend against MIAs:adding noise to word count statistics and utilizing inherent privacy.These two kinds of methods have their respective limitations.Noise sampled from the Laplacian distribution sometimes produces negative word count statistics,which render terrible parameter estimation in CGS.Utilizing inherent privacy could only provide weak guaranteed privacy when defending against MIAs.It is promising to propose an effective framework to obtain accurate parameter estimations with guaranteed differential privacy.The key issue of obtaining accurate parameter estimations when introducing differential privacy in CGS is making good use of the privacy budget such that a precise noise scale is derived.It is the first time that R′enyi differential privacy(RDP)has been introduced into CGS and we propose RDP-LDA,an effective framework for analyzing the privacy loss of any differentially private CGS.RDP-LDA could be used to derive a tighter upper bound of privacy loss than the overestimated results of existing differentially private CGS obtained byε-DP.In RDP-LDA,we propose a novel truncated-Gaussian mechanism that keeps word count statistics non-negative.And we propose distribution perturbation which could provide more rigorous guaranteed privacy than utilizing inherent privacy.Experiments validate that our proposed methods produce more accurate parameter estimation under the JS-divergence metric and obtain lower precision and recall when defending against MIAs.

作者 Tao Huang Su-Yun Zhao Hong Chen Yi-Xuan Liu 黄涛;赵素云;陈红;刘艺璇(Key Laboratory of Data Engineering and Knowledge Engineering(Renmin University of China),Ministry of Education Beijing 100087,China;School of Information,Renmin University of China,Beijing 100087,China)

机构地区 Key Laboratory of Data Engineering and Knowledge Engineering(Renmin University of China) School of Information

出处《Journal of Computer Science & Technology》 SCIE EI CSCD 2022年第6期1382-1397,共16页 计算机科学技术学报（英文版）

基金 the National Natural Science Foundation of China under Grant Nos.62072460,62076245,and 62172424 the Beijing Natural Science Foundation under Grant No.4212022.

关键词 latent Dirichlet allocation parameter estimation membership inference attack Rényi differential privacy

分类号 TP30 [自动化与计算机技术—计算机系统结构]

引文网络
相关文献

1Bingzheng Li,Zheng Zhang,Xiaomei Wang,Sheng Qu,Jiangxing Wu.SecMVX:Analysis on the Vulnerability of Multi-Variant Execution[J].China Communications,2021,18(8):85-95.
2Qinyang Miao,Hui Lin,Jia Hu,Xiaoding Wang.An intelligent and privacy-enhanced data sharing strategy for blockchain-empowered Internet of Things[J].Digital Communications and Networks,2022,8(5):636-643. 被引量：3
3覃浩,刘振华,苏立伟,杨秋勇,胡如乐.基于Tex-RCNN的电力营销语音服务用户意图识别模型[J].微型电脑应用,2022,38(12):93-97. 被引量：5
4K.Sujatha,V.Udayarani.Deep restricted and additive homomorphic ElGamal privacy preservations over big healthcare data[J].International Journal of Intelligent Computing and Cybernetics,2022,15(1):1-16.
5朱梦凡,陈博源.基于改进YOLOv5s的口罩佩戴检测方法[J].现代计算机,2022,28(20):37-41. 被引量：1
6赖丽足,陶嵘,任志洪.心理咨询的LIWC语言特征对咨询效果的预测[J].心理科学,2022,45(3):747-753.
7KIRTAN BHANA.An Exemplary Leadership[J].ChinAfrica,2022,14(12):12-13.
8陶申,张覃轶,郝强,孙伟,林耀军.AUS-10马氏体不锈钢厨刀腐蚀失效研究[J].热加工工艺,2022,51(22):161-165. 被引量：1
9Information for Authors[J].Journal of Traditional Chinese Medicine,2022,42(1):155-158.
10YANG Ruizhe,ZHAO Xuehui,ZHANG Yanhua,SI Pengbo,TENG Yinglei.The adaptive distributed learning based on homomorphic encryption and blockchain[J].High Technology Letters,2022,28(4):337-344. 被引量：1

Journal of Computer Science & Technology

2022年第6期

浏览历史

内容加载中请稍等...

Improving Parameter Estimation and Defensive Ability of Latent Dirichlet Allocation Model Training Under Rényi Differential Privacy

相关作者

相关机构

相关主题

浏览历史