期刊文献+

相对邻域与剪枝策略优化的密度峰值聚类算法 被引量:15

Relative Neighborhood and Pruning Strategy Optimized Density Peaks Clustering Algorithm
下载PDF
导出
摘要 针对Science发表的密度峰值聚类(Density peaks clustering,DPC)算法及其改进算法效率不高的缺陷,提出一种相对邻域和剪枝策略优化的密度峰值聚类(Relative neighborhood and pruning strategy optimized DPC,RP-DPC)算法.DPC聚类算法主要有两个阶段:聚类中心点的确定和非聚类中心点样本的类簇分配,并且时间复杂度集中在第1个阶段,因此RP-DPC算法针对该阶段做出改进研究.RP-DPC算法去掉了DPC算法预先计算距离矩阵的步骤,首先利用相对距离将样本映射到相对邻域中,再从相对邻域来计算各样本的密度,从而缩小各样本距离计算及密度统计的范围;然后在计算各样本的δ值时加入剪枝策略,将大量被剪枝样本δ值的计算范围从样本集缩小至邻域以内,极大地提高了算法的效率.理论分析和在人工数据集及UCI数据集的对比实验均表明,与DPC算法及其改进算法相比,RP-DPC算法在保证聚类质量的同时可以实现有效的时间性能提升. In order to overcome the low efficiency defect of density peaks clustering(DPC)algorithm published in Science and its improvement algorithms,a new relative neighborhood and pruning strategy optimized DPC(RP-DPC)algorithm is proposed in this paper.There are two main phases in DPC:determination of cluster centers and cluster assignation for remaining samples.The time complexity of DPC is determined by the first phase,so the improvements for the determination of cluster centers are proposed in this paper.Firstly,the RP-DPC algorithm maps samples to their relative neighborhoods,then computes the local density of every sample on the basis of relative neighborhood.This method shrinks the range of distance computing and density counting of every sample,thus avoiding a lot of unnecessary distance calculations.Secondly,the pruning strategy is led into theδvalue computing of every sample,which restricts theδcomputing of massive pruned samples to within their own neighborhoods,so as to greatly improve the efficiency.We demonstrate that:our RP-DPC algorithm can improve the time performance significantly on the basis of having same clustering quality compared with the DPC algorithm and its improvement algorithms through the theory analysis and the experiments on several popular test cases that include both synthetic and real-world data sets from the UCI machine learning repository.
作者 纪霞 姚晟 赵鹏 JI Xia;YAO Sheng;ZHAO Peng(School of Computer Science and Technology,Anhui University,Hefei 230601;Key Laboratory of Intelligent Computing and Signal Processing of the Ministry of Education,Anhui University,Hefei 230601)
出处 《自动化学报》 EI CSCD 北大核心 2020年第3期562-575,共14页 Acta Automatica Sinica
基金 国家自然科学基金(61602004,61672034) 安徽省重点研究与开发计划(1804d8020309) 安徽省自然科学基金(1708085MF160,1908085MF188) 安徽省高等学校自然科学研究重点项目(KJ2016A 041,KJ2017A011) 安徽大学信息保障技术协同创新中心公开招标课题(ADXXBZ201605)资助。
关键词 聚类算法 密度峰值 相对邻域 剪枝策略 Clustering algorithm density peaks relative neighborhood pruning strategy
  • 相关文献

参考文献2

二级参考文献40

  • 1Han J W, Kamber M. Data Mining Concepts and Techniques. 2nd ed. New York:Elsevier Inc, 2006. 383-424. 被引量:1
  • 2Jain A K. Data clustering:50 years beyond K-means. Pattern Recogn Lett, 2010, 31:651-666. 被引量:1
  • 3Williamson B, Guyon I. Clustering:science or art?. J Mach Learn Res, 2012, 27:65-80. 被引量:1
  • 4Frey B J, Dueck D. Clustering by passing messages between data points. Science, 2007, 315:972-976. 被引量:1
  • 5Rodri?uez A, Laio A. Clustering by fast search and find of density peaks. Science, 2014, 344:1492-1496. 被引量:1
  • 6Xu R, Wunsch D. Survey of clustering algorithms. IEEE Trans Neural Netw Learn Syst, 2005, 16:645-678. 被引量:1
  • 7McQueen J. Some methods for classification and analysis of multivariate observations. In:Proceedings of 5th Berkeley Symposium on Mathematical Statistics and Probability. Los Angeles:University of California, 1967. 281-297. 被引量:1
  • 8Likas A, Vlassis N, Verbeek J J. The global K-means clustering algorithm. Pattern Recogn, 2003, 36:451-464. 被引量:1
  • 9Xie J Y, Jiang S, Xie W, et al. An efficient global K-means clustering algorithm. J Comput, 2011, 6:271-279. 被引量:1
  • 10Ester M, Kriegel H P, Sander J, et al. A density-based algorithm for discovering clusters in large spatial databases with noise. In:Proceedings of ACM SIGKDD'96, Portland, 1996. 226-231. 被引量:1

共引文献114

同被引文献106

引证文献15

二级引证文献46

相关作者

内容加载中请稍等...

相关机构

内容加载中请稍等...

相关主题

内容加载中请稍等...

浏览历史

内容加载中请稍等...
;
使用帮助 返回顶部