摘要
提出了基于聚类的敏感属性-多样性匿名化算法,该算法生成的每个聚类至少有个不同的敏感属性值,每个聚类的大小介于和2-1之间,以达到最优划分并提高数据的安全性。同时,该算法生成聚类候选记录集以减少不必要的计算和比较,生成聚类时总是选择与聚类质心信息损失最小的记录,提高了算法效率并减少信息的损失。实验结果表明,该算法是高效的,且生成的匿名数据集具有较高的可用性。
Two clustering-based sensitive attribute-diversity anonymization algorithms are presented.The algorithms generate the clusters which have at least distinct values of sensitive attributes.The size of each cluster is between and 2-1 to achieve the optimal partition and to improve the security of the data.The algorithms also generate the candidate tuples to reduce the unnecessary computation and the comparison operations,and always select the tuple that has minimal information loss to cluster centroid to generate the clusters,and improve the algorithm efficiency and reduce the information loss.The experimental results show that the presented algorithms are efficient and the generated anonymity table has high utility.
出处
《计算机工程与设计》
CSCD
北大核心
2010年第20期4378-4381,共4页
Computer Engineering and Design
基金
广西科学基金项目(桂科基0728033)
广西高校人才小高地建设创新团队资助计划基金项目(桂教人[2007]71号)
关键词
隐私泄露
匿名化
-多样性
敏感属性
聚类
privacy disclosure anonymization -diversity sensitive attribute clustering