摘要
为了克服粗糙K-均值聚类算法初始聚类中心点随机选取,以及样本密度函数定义所存在的缺陷,基于数据对象所在区域的样本点密集程度,定义了新的样本密度函数,选择相互距离最远的K个高密度样本点作为初始聚类中心,克服了现有粗糙K-均值聚类算法的初始中心随机选取的缺点,从而使得聚类结果更接近于全局最优解。同时在类均值计算中,对每个样本根据定义的密度赋以不同的权重,得到不受噪音点影响的更合理的质心。利用UC I机器学习数据库的6组数据集,以及随机生成的带有噪音点的人工模拟数据集进行测试,证明本文算法具有更好的聚类效果,而且对噪音数据有很强的抗干扰性能。
A novel rough K-means clustering algorithm was presented based on the weight of exemplar density to overcome the drawback of selecting initial seeds randomly of available rough K-means algorithms.A new density function was defined for each sample according to the denseness of samples,and the top K samples with higher density and far away from each other were selected as initial centers of a rough K-means clustering algorithm.Also the new weight was defined for each exemplar according to the value of the new density function,so that the better centroids of each cluster could be calculated out without being influenced by noisy data.Experiments on six UCI data sets and on synthetically generated data sets with noise points proved that our algorithm got a better clustering result,and had a strong anti-interference performance for noise data.
出处
《山东大学学报(理学版)》
CAS
CSCD
北大核心
2010年第7期1-6,共6页
Journal of Shandong University(Natural Science)
基金
中央高校基本科研业务费专项资金重点资助项目(GK200901006)
陕西省自然科学基础研究计划项目(2010JM3004)
关键词
聚类算法
粗糙K-均值
聚类中心
加权
密度
clustering algorithm
rough K-means
clustering center
weight
density