期刊文献+

密度峰值优化初始中心的K-medoids聚类算法 被引量:27

K-medoids Clustering Algorithms with Optimized Initial Seeds by Density Peaks
下载PDF
导出
摘要 针对快速K-medoids聚类算法和方差优化初始中心的K-medoids聚类算法存在需要人为给定类簇数,初始聚类中心可能位于同一类簇,或无法完全确定数据集初始类簇中心等缺陷,受密度峰值聚类算法启发,提出了两种自适应确定类簇数的K-medoids算法。算法采用样本x i的t最近邻距离之和倒数度量其局部密度ρi,并定义样本x i的新距离δi,构造样本距离相对于样本密度的决策图。局部密度较高且相距较远的样本位于决策图的右上角区域,且远离数据集的大部分样本。选择这些样本作为初始聚类中心,使得初始聚类中心位于不同类簇,并自动得到数据集类簇数。为进一步优化聚类结果,提出采用类内距离与类间距离之比作为聚类准则函数。在UCI数据集和人工模拟数据集上进行了实验测试,并对初始聚类中心、迭代次数、聚类时间、Rand指数、Jaccard系数、Adjusted Rand index和聚类准确率等经典聚类有效性评价指标进行了比较,结果表明提出的K-medoids算法能有效识别数据集的真实类簇数和合理初始类簇中心,减少聚类迭代次数,缩短聚类时间,提高聚类准确率,并对噪音数据具有很好的鲁棒性。 To overcome the deficiencies of the fast K-medoids and the variance based K-medoids clustering algo- rithms whose number of clusters of a dataset must be provided manually and their initial seeds may locate in a same cluster or cannot be totally found etc. Stimulated by the density peak clustering algorithm, this paper proposes two new K-medoids clustering algorithms. The new algorithms define the local density ρi of point xi as the reciprocal of the sum of the distance between xl and its t nearest neighbors, and new distance δi of point xi is defined as well, then the decision graph of a point distance relative to its local density is plotted. The points with higher local density and apart from each other located at the upper right comer of the decision graph, which are far away from the rest points in the same dataset, are chosen as the initial seeds for K-medoids, so that the seeds will be in different clusters and the number of clusters of the dataset is automatically determined as the number of initial seeds. In order to get a better clustering, a new measure function is proposed as the ratio of the intra-distance of clusters to the inter- distance between clusters. The proposed two new K-medoids algorithms are tested on the real datasets from UCI ma- chine learning repository and on the synthetic datasets. The clustering results of the proposed algorithms are evaluated in terms of the initial seeds selected, iterations, clustering time, Rand index, Jaccard coefficient, Adjusted Randindex and the clustering accuracy. The experimental results demonstrate that the proposed new K-medoids clustering algorithms can recognize the number of clusters of a dataset, find its proper initial seeds, reduce the clustering itera- tions and the clustering time, improve the clustering accuracy, and are robust to noises as well.
出处 《计算机科学与探索》 CSCD 北大核心 2016年第2期230-247,共18页 Journal of Frontiers of Computer Science and Technology
基金 国家自然科学基金No.31372250 陕西省科技攻关项目No.2013K12-03-24 中央高校基本科研业务费专项资金No.GK201503067~~
关键词 聚类 K-medoids算法 初始聚类中心 密度峰值 准则函数 clustering K-medoids algorithm initial seeds density peak measure function
  • 相关文献

参考文献22

二级参考文献88

共引文献355

同被引文献247

引证文献27

二级引证文献109

相关作者

内容加载中请稍等...

相关机构

内容加载中请稍等...

相关主题

内容加载中请稍等...

浏览历史

内容加载中请稍等...
;
使用帮助 返回顶部