摘要
针对Science发表的密度峰值聚类(Density peaks clustering,DPC)算法及其改进算法效率不高的缺陷,提出一种相对邻域和剪枝策略优化的密度峰值聚类(Relative neighborhood and pruning strategy optimized DPC,RP-DPC)算法.DPC聚类算法主要有两个阶段:聚类中心点的确定和非聚类中心点样本的类簇分配,并且时间复杂度集中在第1个阶段,因此RP-DPC算法针对该阶段做出改进研究.RP-DPC算法去掉了DPC算法预先计算距离矩阵的步骤,首先利用相对距离将样本映射到相对邻域中,再从相对邻域来计算各样本的密度,从而缩小各样本距离计算及密度统计的范围;然后在计算各样本的δ值时加入剪枝策略,将大量被剪枝样本δ值的计算范围从样本集缩小至邻域以内,极大地提高了算法的效率.理论分析和在人工数据集及UCI数据集的对比实验均表明,与DPC算法及其改进算法相比,RP-DPC算法在保证聚类质量的同时可以实现有效的时间性能提升.
In order to overcome the low efficiency defect of density peaks clustering(DPC)algorithm published in Science and its improvement algorithms,a new relative neighborhood and pruning strategy optimized DPC(RP-DPC)algorithm is proposed in this paper.There are two main phases in DPC:determination of cluster centers and cluster assignation for remaining samples.The time complexity of DPC is determined by the first phase,so the improvements for the determination of cluster centers are proposed in this paper.Firstly,the RP-DPC algorithm maps samples to their relative neighborhoods,then computes the local density of every sample on the basis of relative neighborhood.This method shrinks the range of distance computing and density counting of every sample,thus avoiding a lot of unnecessary distance calculations.Secondly,the pruning strategy is led into theδvalue computing of every sample,which restricts theδcomputing of massive pruned samples to within their own neighborhoods,so as to greatly improve the efficiency.We demonstrate that:our RP-DPC algorithm can improve the time performance significantly on the basis of having same clustering quality compared with the DPC algorithm and its improvement algorithms through the theory analysis and the experiments on several popular test cases that include both synthetic and real-world data sets from the UCI machine learning repository.
作者
纪霞
姚晟
赵鹏
JI Xia;YAO Sheng;ZHAO Peng(School of Computer Science and Technology,Anhui University,Hefei 230601;Key Laboratory of Intelligent Computing and Signal Processing of the Ministry of Education,Anhui University,Hefei 230601)
出处
《自动化学报》
EI
CSCD
北大核心
2020年第3期562-575,共14页
Acta Automatica Sinica
基金
国家自然科学基金(61602004,61672034)
安徽省重点研究与开发计划(1804d8020309)
安徽省自然科学基金(1708085MF160,1908085MF188)
安徽省高等学校自然科学研究重点项目(KJ2016A 041,KJ2017A011)
安徽大学信息保障技术协同创新中心公开招标课题(ADXXBZ201605)资助。
关键词
聚类算法
密度峰值
相对邻域
剪枝策略
Clustering algorithm
density peaks
relative neighborhood
pruning strategy