摘要
目前经典的聚类算法在内存空间有限的情况下,聚类受到时间、空间等各方面的限制,提出一种基于代表点的快速聚类算法FCBRP(fast clustering based representative points).首先,判定数据集中所有节点的属性,当节点的D临域内存在大于等于K个邻居节点时,将其定义为代表点,代表点D临域内所有邻居节点与该代表点之间的平均欧氏距离即为该代表点的相关密度RD,所有的代表点组成代表点集合;将所有在代表点的D临域内的节点定义为能被代表的节点,并将其进行存储;既不是代表点、又不能被其它节点所代表的节点,将其定义为噪音节点;其次,对代表点集合进行聚类,对于给定的密度标准α,如果两个代表点满足密度相关,即两个代表点的相关密度分别乘以密度标准α后同时大于等于两者之间的欧氏距离,则将其划分到同一类簇中,通过对代表点的聚类,达到对数据的区域划分,得到所有类簇的基本形状;最后,对于被其它代表点所代表的节点,通过检测代表它们的代表点所属的类簇,判定被代表的节点所属的类簇,对于少数位于不同类簇中的代表点的D临域内的节点,将其划分到相对距离较近的代表点所属的类簇中.实验证明,FCBRP算法对空间需求较小,效率快,精度高,鲁棒性更佳.
The existing classical clustering algorithms' effect is not good enough by various restrictions in limited memory, such as time complexity and space complexity, this paper presents an algorithm-fast clustering based representative points(FCBRP). Firstly, judge the attribute of the points of the dataset, if the point that has more than K neighbors in its radius of D, define it representative point, the representative point's related density RD is defined that the average Euclidean distance from the representative point to the neighbors of it, all the representative points constitute the set of representative. Define the point represented-point which is the neighbor of the representative points and store all of them. Define the other points noises. Secondly, cluster the set ofrepresentative, as the given density standard a, if the distance ot the two representanve points is DDtll 811I^tllt:l tlia** their related density RD multiplied the density standard a, define them belong one cluster, by cluster the representative points to separate the data region, in order to get the shape of the clusters. Finally, find the cluster number of the points which can be represented by other representative points, that is to say,find the cluster number of their representative points, as the represented-points those are the neighbors of representative points which are belong to deferent clusters, separate them to the cluster of representative point which are closer to the represented- point. The experiment result proved that the FCBRP algorithm is less space required, high-efficiency, high precision, more robust.
出处
《南京大学学报(自然科学版)》
CAS
CSCD
北大核心
2012年第4期504-512,共9页
Journal of Nanjing University(Natural Science)
基金
教育部高等学校博士学科点专项科研基金(20110095110010)
关键词
代表点选取
代表点聚类
FCBRP算法
the attribute of the points, the seiective of representative points, the clustering of representativepoints, fast clustering based representative points