摘要
在大数据时代,数据提供者需要保证自身隐私,数据分析者要挖掘数据潜在价值,寻找数据隐私性与数据可用性间的均衡关系成为研究热点。现有方法多数关注隐私保护方法本身,而忽略了隐私保护方法对数据可用性的影响。在对隐私效用均衡方法研究现状分析的基础上,针对数据集中不同公开信息对隐私保护需求不同的问题,提出基于多变量信源编码的隐私效用均衡方法,并给出隐私效用均衡区域。分析表明,隐私信息与公开信息的关联度越大,对公开信息扰动程度的增加会显著提高隐私保护效果。同时,方差较大的变量对应的公开信息,可选择较小的扰动,确保公开信息可用性较大。
In the age of big data, data providers need to ensure their privacy, while data analysts need to mine the value of data. So, how to find the privacy-utility tradeoff has become a research hotspot. Current works mostly focus on privacy preserving methods, ignoring the data utility. Based on the current research of privacy utility equilibrium methods, a privacy-utility tradeoff method using multi-variable source coding was proposed to solve the problem that different public datasets in the same database have different privacy requirements. Two results are obtained by simulations. The first result is that the greater the association degree between the private information and public information, the increase of the distortion degree of public information will significantly improve the effect of privacy preservation. The second result is that public information with larger variance should be less distorted to ensure more utility.
出处
《通信学报》
EI
CSCD
北大核心
2015年第12期172-177,共6页
Journal on Communications
基金
国家自然科学基金资助项目(61173017)
工信部通信软科学基金资助项目(2014-R-42
2015-R-29)
信息网络安全公安部重点实验室开放课题基金资助项目(C14613)~~
关键词
隐私保护
隐私效用均衡
信源编码
率失真
privacy preservation
privacy-utility tradeoff
source coding
rate distortion