摘要
为快速有效地对大规模数据对象聚类,提出了一种基于代表点质量的万有引力聚类算法GCARM.算法首先扫描数据集并利用K-叉树结构使得距离相近的对象凝聚在一起成为具有质量的代表点;然后计算代表点之间的万有引力,使得引力大于设定阈值的代表点连通起来,其最大连通对象的集合就是聚类.实验结果表明,GCARM算法可以在保证精度的情况下识别任意形状,任意大小的聚类并去除噪声,并具有较高的效率和可扩展性.
To accelerate the cluster process for large-scale datasets, a new method called gravitation based clustering algorithm using representative points with mass is explored. Firstly, the algorithm scans the dataset and uses K-tree structure to form the near objects into representative points with mass; then it calculates the universal gravitation between them. The representative points having bigger attraction than a threshold presupposed would be connected and considered objects in one cluster with high similarity. Experiments show that GCARM could recognize clusters of arbitrary shape and arbitrary size, and remove noise with high efficiency and scalability while guaranteeing the accuracy.
出处
《南开大学学报(自然科学版)》
CAS
CSCD
北大核心
2016年第4期8-15,共8页
Acta Scientiarum Naturalium Universitatis Nankaiensis
基金
河南省高等学校重点科研项目(15A520089)
关键词
聚类
质量
代表点
万有引力
cluster
mass
representative point
universal gravitation