摘要
针对DBSCAN聚类算法随着数据量增大,耗时越发非常严重的问题,提出一种基于KD树改进的DBSCAN算法(以下简称KD-DBSCAN).通过KD树对数据集进行划分,构造邻域对象集,提前区分出噪声点和核心点,避免聚类过程中对噪声的邻域集计算以及加快了核心点对象的邻域集查询速度.文中以浮动车GPS数据为实验数据,对比传统DBSCAN算法和KD-DBSCAN算法的聚类效果和时间性能,实验结果表明KD-DBSCAN算法的聚类效果和传统的DBSCAN基本一致,但时间性能有很大的提升.
To tackle the problem that density-based spatial clustering of applications with noise(DBSCAN)clustering algorithm is increasingly time-consuming with the increase in data volume,this study proposes an improved DBSCAN algorithm based on a K-dimensional(KD)tree(hereinafter referred to as KD-DBSCAN).The KD tree is used to divide the data set,construct the neighborhood object set,and distinguish the noise point and the core point in advance to avoid the calculation of the noise neighborhood set in the clustering process and speed up the neighborhood set query of the core point object.In this study,the global positioning system(GPS)data of a floating car is used as experimental data to compare the traditional DBSCAN algorithm and KD-DBSCAN algorithm in aspects of the clustering effect and time performance.The experimental results show that the KD-DBSCAN algorithm is comparable to the traditional DBSCAN algorithm in the clustering effect but has greatly improved time performance.
作者
陈文龙
时宏伟
CHEN Wen-Long;SHI Hong-Wei(College of Computer Science,Sichuan University,Chengdu 610065,China)
出处
《计算机系统应用》
2022年第2期305-310,共6页
Computer Systems & Applications
关键词
聚类
DBSCAN算法
KD树
clustering
density-based spatial clustering of applications with noise(DBSCAN)algorithm
KD tree