Privacy preserving data releasing is an important problem for reconciling data openness with individual privacy. The state-of-the-art approach for privacy preserving data release is differential privacy, which offers ...Privacy preserving data releasing is an important problem for reconciling data openness with individual privacy. The state-of-the-art approach for privacy preserving data release is differential privacy, which offers powerful privacy guarantee without confining assumptions about the background knowledge about attackers. For genomic data with huge-dimensional attributes, however, current approaches based on differential privacy are not effective to handle. Specifically, amount of noise is required to be injected to genomic data with tens of million of SNPs (Single Nucleotide Polymorphisms), which would significantly degrade the utility of released data. To address this problem, this paper proposes a differential privacy guaranteed genomic data releasing method. Through executing belief propagation on factor graph, our method can factorize the distribution of sensitive genomic data into a set of local distributions. After injecting differential-privacy noise to these local distributions, synthetic sensitive data can be obtained by sampling on noise distribution. Synthetic sensitive data and factor graph can be further used to construct approximate distribution of non-sensitive data. Finally, non-sensitive genomic data is sampled from the approximate distribution to construct a synthetic genomic dataset.展开更多
通过定义考虑权重的匿名表效用度量函数,用于在泛化步骤决定下一个泛化路径以取得较好的泛化效果,在此基础上提出利用频繁项集发现思想的微观数据表匿名隐私保护算法ABFI(algorithm based on frequent setmining),匿名过程仅仅对不满足...通过定义考虑权重的匿名表效用度量函数,用于在泛化步骤决定下一个泛化路径以取得较好的泛化效果,在此基础上提出利用频繁项集发现思想的微观数据表匿名隐私保护算法ABFI(algorithm based on frequent setmining),匿名过程仅仅对不满足隐私保护要求等价组中准码属性取值进行泛化。实验结果表明,该方法可以减少信息损失,求解得到更加符合数据分析任务需求的局部最优匿名表。展开更多
基金partly supported by the National Natural Science Foundation of China (Nos. 61632010 and 61602129)
文摘Privacy preserving data releasing is an important problem for reconciling data openness with individual privacy. The state-of-the-art approach for privacy preserving data release is differential privacy, which offers powerful privacy guarantee without confining assumptions about the background knowledge about attackers. For genomic data with huge-dimensional attributes, however, current approaches based on differential privacy are not effective to handle. Specifically, amount of noise is required to be injected to genomic data with tens of million of SNPs (Single Nucleotide Polymorphisms), which would significantly degrade the utility of released data. To address this problem, this paper proposes a differential privacy guaranteed genomic data releasing method. Through executing belief propagation on factor graph, our method can factorize the distribution of sensitive genomic data into a set of local distributions. After injecting differential-privacy noise to these local distributions, synthetic sensitive data can be obtained by sampling on noise distribution. Synthetic sensitive data and factor graph can be further used to construct approximate distribution of non-sensitive data. Finally, non-sensitive genomic data is sampled from the approximate distribution to construct a synthetic genomic dataset.
文摘通过定义考虑权重的匿名表效用度量函数,用于在泛化步骤决定下一个泛化路径以取得较好的泛化效果,在此基础上提出利用频繁项集发现思想的微观数据表匿名隐私保护算法ABFI(algorithm based on frequent setmining),匿名过程仅仅对不满足隐私保护要求等价组中准码属性取值进行泛化。实验结果表明,该方法可以减少信息损失,求解得到更加符合数据分析任务需求的局部最优匿名表。