摘要
核聚类算法是一种能够处理样本间差异微弱的有效聚类算法。以粗糙集理论为基础,将基于属性重要度的属性约简算法应用到核聚类算法中,提出一种新的聚类改进算法,由此可以得到高准确率低复杂度的良好结果。该算法在使用核函数对样本优化前,首先用基于属性重要度的约简算法对样本属性进行处理,同时引入信息熵来改进约简算法,从而删除冗余属性得到较优的属性集;然后对样本进行K-means聚类,采用软划分把样本划分到相应聚类中心的上下近似子集中,根据近似子集中样本对聚类的影响程度不同,对上下近似中的样本设置不同的权重来共同决定新的聚类中心。此算法相当于对样本进行了双重优化,采用UCI数据集来测试算法性能。通过和传统聚类算法比较,得出本算法在提高聚类精度的同时降低了复杂度,收敛速度也得到了一定提高。
Kernel clustering is an effective algorithm which can deal with samples that have weak differences.On the basis that of new improved attribute importance under the theory of rough set is applied to the kernel clustering algorithm.Before the samples are optimized by the kernel function,their properties is processed by the reduction algorithm which is based on the attribute importance.At the same time,Information Entropy is introduced to improve the reduction algorithm.So the redundant attributes are deleted and the optimum set of attributes is obtained;Then,the samples are clustered by K-means clustering algorithms,and the samples are divided into the upper and lower approximate subsets of the corresponding cluster centers.Due to the samples in approximate subsets having different influence on cluster,different weighs are designed to determine the new clustering centers.This paper adopts UCI data sets to test the performance of the algorithm.Through the comparison with traditional kernel clustering algorithm is shows that the proposed clustering algorithm improves the cluster result's accuracy,reduces the complexity and shortens the convergence time significantly.
出处
《广西师范大学学报(自然科学版)》
CAS
北大核心
2011年第3期105-109,共5页
Journal of Guangxi Normal University:Natural Science Edition
基金
国家自然科学基金资助项目(60975039)
江苏省基础研究计划(自然科学基金)资助项目(BK2009093)
关键词
粗糙集
属性约简
属性重要度
信息熵
核聚类
rough set
attribute reduction
attribute importance
information entropy
kernel clustering