摘要
聚类集成的目标是通过集成多个聚类结果来提高聚类算法的稳定性、鲁棒性以及精度.近些年,聚类集成受到了越来越多的关注.现有的集成聚类通常平等地对待所有基聚类,而不考虑它们的重要度.虽然学者们已经在这一方面做出了一些努力,例如使用加权策略来改进共协关系矩阵,但无论是给基聚类加权还是对类重要度评价时都忽略了样本对于其所在类贡献的差异.为此,提出了基于样本对加权共协关系矩阵的聚类集成算法,该算法利用k.means算法产生多个基聚类结果,然后对于其中的每个类再利用k.means算法产生多个小类,并计算去掉样本对所在的小类后类的不确定性变化的程度来评价该样本对的重要度,最后通过层次聚类算法得到聚类结果.在六个UCI数据集上的实验结果表明,基于样本对加权共协关系矩阵的聚类集成算法的性能优于三种经典的基于共协关系矩阵的聚类集成算法。
The goal of clustering ensemble is to improve the stability, robustness and accuracy of the final clustering results by integrating multiple clustering results. In recent years, clustering ensemble has attracted more and more attention. One limitation of most existing clustering ensemble methods is that they generally treat all base clustering equally, regardless of their importance. Although scholars have made some efforts in this aspect, for example, the weighted strategy is used to improve the co-association matrix. However,they ignore the difference in the contribution of samples to the classes they belong to when either weighting the base clustering or evaluating the class importance. Therefore, sample pairwise weighting co-association matrix based ensemble clustering algorithm is proposed. The algorithm firstly uses the k-means algorithm to generate multiple base partition results and multiple small classes for each class. The importance of the sample to the class is evaluated by calculating the change degree of uncertainty of the class after removing the subclass of the sample pairwise. Finally,the final clustering result can be obtained through the hierarchical clustering algorithm. Experimental results on six UCI data sets show that the performance of sample pairwise weighting co.association matrix based clustering ensemble algorithm is superior to the three classical clustering ensemble algorithms based on co.association matrix.
作者
王彤
魏巍
王锋
Wang Tong;Wei Wei;Wang Feng(School of Computer and Information Technology,Shanxi University,Taiyuan,030006,China;Key Laboratory of Computation Intelligence and Chinese Information Processing,Ministry of Education,Shanxi University,Taiyuan,030006,China)
出处
《南京大学学报(自然科学版)》
CAS
CSCD
北大核心
2019年第4期592-600,共9页
Journal of Nanjing University(Natural Science)
基金
国家自然科学基金(61772323,61303008,61603229,61502288)
山西省高等教育机构科技创新项目(2016111)
关键词
聚类
聚类集成
共协矩阵
加权策略
clustering
clustering ensemble
co-association matrix
weighted strategy