摘要
为了减少大规模数据集在聚类过程中的计算复杂度和运行时间,本文提出了一种基于自适应网格划分和决策图的聚类算法AGPCA.首先,采用相对熵自适应划分数据空间,形成明显的稀疏网格和稠密网格.将网格作为聚类对象,降低以点为对象之间的距离计算复杂度.之后,依据决策图思想确定簇心网格对象,并通过Kd树完成邻接网格的查找和合并以实现聚类.以多个标准数据集和真实的出租车GPS轨迹数据作为测试对象,并与现有一些先进的聚类算法进行对比实验.实验结果表明所提算法结合了网格划分和局部距离判断的优点,具有较高的准确性和运行效率.
In order to reduce the computational complexity and running time in the clustering process for massive data sets,a novel clustering algorithm named AGPCA,based on adaptive grid partition and decision graph is proposed in this paper. Firstly,relative entropy is used to divide the dataset space adaptively to form many obvious sparse grids and dense grids. Then,these partitioned grids instead of the data set of points are regarded as clustering objects to reduce the complexity of distance computation. Next,the cluster center grids are determined according to the decision graph,and the clustering process is implemented by the adjacency grid inquiry and merging based on KD tree index structure. The AGPCA algorithm compared with state-of-the-art clustering methods using several standard data sets and the ground truth. Experimental results show that the proposed algorithm combines the advantages of grid partition and local distance judgment,ow ning higher accuracy and efficiency for large-scale data processing.
作者
蔡莉
江芳
许卫霞
梁宇
CAI Li;JIANG Fang;XU Wei-xia;LIANG Yu(School of Computer Science,Fudan University,Shanghai 200433,China;School of Software,Yunnan University,Kunming 650091,China)
出处
《小型微型计算机系统》
CSCD
北大核心
2019年第10期2033-2038,共6页
Journal of Chinese Computer Systems
基金
国家自然科学基金项目(61663047)资助
关键词
自适应网格划分
决策图
聚类算法
相对熵
adaptive grid partition
decision diagram
clustering algorithm
relative entropy