摘要
利用独立特征子集与连接三元组思想,将多个基聚类结果进行聚合,将显著提高聚类集成结果的准确性.针对特征数目较多的复杂数据集,本文提出了一种基于多链接特征子集的聚类集成算法,根据特征之间的关系,提出独立特征子集的选取算法,将生成的数据子集作为聚类集成算法的输入,使用不同的聚类算法生成多种不同的基聚类结果,然后提出一种能够关联不同属性的集成算法,将多种不同的基聚类结果作为集成算法的输入进行集成,融合不同的结果得到唯一的结果.该算法的优点包括:1)通过对特征子集的选取,消除了重复特征对聚类结果的干扰,有利于充分利用已有特征信息;2)采用多链接算法融合基聚类结果计算相似度矩阵,可以充分挖掘数据点之间的潜在关系.对不同数据集的实验研究表明,该算法相较于传统的聚类集成算法,可以提高聚类集成结果的准确率.
Using the independent feature subsets and connected triples ideas to aggregate multiple clustering results will significantly improve the accuracy of clustering ensemble results. Aiming at the complex data sets with many features,this paper proposes a clustering ensemble algorithm based on multi-link feature subsets. According to the relationship between features,the algorithm of selecting independent feature subsets is proposed. The generated data subsets are used as the data sets of the input of the clustering ensemble algorithm,using different clustering algorithms to generate a variety of different clustering results,and then proposes a multi-link integration algorithm that can effectively fuse the clustering results,and gets the final results. The advantages of the algorithm include: 1)Through the selection of independent feature subsets,the interference of redundant features on clustering results is avoided,which is beneficial to make full use of existing feature information;2) Multi-link algorithm fusion clustering results are used to calculate similarity matrix can fully exploit the potential relationship between data points. Experimental research on different data sets show s that the algorithm can improve the accuracy of clustering ensemble results compared with traditional clustering ensemble algorithms.
作者
陈彦萍
高宇坤
张恒山
夏虹
CHEN Yan-ping;GAO Yu-kun;ZHANG Heng-shan;XIA Hong(School of Computer Science and Technology,Xi'an University of Posts and Telecommunications,Xi7an 710121,China;Shaanxi Key Laboratory of Network Data Analysis and Intelligent Processing(Xizan University of Posts and Telecommunications) ,Xi'an 710121,China)
出处
《小型微型计算机系统》
CSCD
北大核心
2019年第10期2097-2101,共5页
Journal of Chinese Computer Systems
基金
国家自然科学基金项目(61702414)资助
陕西省科技统筹创新工程项目(2019ZDLGY07-08)资助
陕西省教育厅专项科研计划项目(16JK1701)资助
关键词
特征选择
聚类集成
连接三元组
潜在信息
feature selection
clustering ensemble
connection triple
potential information