Although k-anonymity is a good way of publishing microdata for research purposes, it cannot resist severalcommon attacks, such as attribute disclosure and the similarity attack. To resist these attacks, many refinemen...Although k-anonymity is a good way of publishing microdata for research purposes, it cannot resist severalcommon attacks, such as attribute disclosure and the similarity attack. To resist these attacks, many refinements of k-anonymity have been proposed with t-closeness being one of the strictest privacy models. While most existing t-closenessmodels address the case in which the original data have only one single sensitive attribute, data with multiple sensitiveattributes are more common in practice. In this paper, we cover this gap with two proposed algorithms for multiple sensitiveattributes and make the published data satisfy t-closeness. Based on the observation that the values of the sensitive attributesin any equivalence class must be as spread as possible over the entire data to make the published data satisfy t-closeness,both of the algorithms use different methods to partition records into groups in terms of sensitive attributes. One uses aclustering method, while the other leverages the principal component analysis. Then, according to the similarity of quasi-identifier attributes, records are selected from different groups to construct an equivalence class, which will reduce the lossof information as much as possible during anonymization. Our proposed algorithms are evaluated using a real dataset. Theresults show that the average speed of the first proposed algorithm is slower than that of the second proposed algorithm butthe former can preserve more original information. In addition, compared with related approaches, both proposed algorithmscan achieve stronger protection of privacy and reduce less.展开更多
基金Supported by Natural Science Foundation of Inner Mongolia(200308020101)Education Bureau Foundation of Inner Mongolia(NJ03004) Natural Science Foundation of Inner Mongolia University of Technology (200217).
文摘Although k-anonymity is a good way of publishing microdata for research purposes, it cannot resist severalcommon attacks, such as attribute disclosure and the similarity attack. To resist these attacks, many refinements of k-anonymity have been proposed with t-closeness being one of the strictest privacy models. While most existing t-closenessmodels address the case in which the original data have only one single sensitive attribute, data with multiple sensitiveattributes are more common in practice. In this paper, we cover this gap with two proposed algorithms for multiple sensitiveattributes and make the published data satisfy t-closeness. Based on the observation that the values of the sensitive attributesin any equivalence class must be as spread as possible over the entire data to make the published data satisfy t-closeness,both of the algorithms use different methods to partition records into groups in terms of sensitive attributes. One uses aclustering method, while the other leverages the principal component analysis. Then, according to the similarity of quasi-identifier attributes, records are selected from different groups to construct an equivalence class, which will reduce the lossof information as much as possible during anonymization. Our proposed algorithms are evaluated using a real dataset. Theresults show that the average speed of the first proposed algorithm is slower than that of the second proposed algorithm butthe former can preserve more original information. In addition, compared with related approaches, both proposed algorithmscan achieve stronger protection of privacy and reduce less.