期刊文献+

基于改进K-means++和DBSCAN的大数据聚类方法 被引量:7

Big data clustering method based on improved K-means++ and DBSCAN
下载PDF
导出
摘要 为改善大规模数据集的处理性能,提出了基于改进K-means++和基于密度的含噪声应用空间聚类(DBSCAN)算法的大数据聚类方法。首先,将K-means++与局部搜索策略相结合,在数据集上进行初始化分区,然后利用DBSCAN算法在每个分组内单独执行数据聚类。利用改进K-means++算法提高数据预处理质量,并通过分区并行聚类的操作显著降低DBSCAN的计算负担,加快处理速度。最后,通过两阶段的剪枝策略对边缘聚类进行高效合并。实验结果表明,所提方法大幅降低了DBSCAN的执行时间,且聚类数据的质量与原DBSCAN算法非常接近,在UCI库的Bitcoin数据集上比其他比较方法的聚类效率提高了10倍以上,在处理时间和聚类数据质量之间实现了最优平衡。 In order to improve the processing performance of large-scale data sets, a big data clustering method based on improved K-means++ and DBSCAN algorithms is proposed. First, K-means++ is combined with a local search strategy to perform initialized partitioning on the data set, and then the DBSCAN algorithm is used to perform data clustering within each data partitions separately. The improved K-means++ algorithm is used to improve the quality of data pre-processing, and the computational burden of DBSCAN is significantly reduced through the operation of data partitioning and parallel clustering, thereby speeding up the overall processing speed. Finally, a two-step pruning strategy is proposed to merge the border clusters efficiently. The experimental results show that the proposed method greatly reduces the execution time of DBSCAN, and the quality of the clustered data is very close to the original DBSCAN algorithm. The clustering efficiency on the Bitcoin data set from the UCI library is more than 10 times higher than that of other comparison methods, and an optimal balance is achieved between processing time and clustering data quality.
作者 张玉琴 梁莉 张建亮 冯向东 Zhang Yuqin;Liang Li;Zhang Jianliang;Feng Xiangdong(College of the Engineering&Technical,Chengdu University of Technology,Leshan 614000,China;School of Mathematics and Physics,Chengdu University of Technology,Chengdu 610059,China)
出处 《国外电子测量技术》 北大核心 2022年第9期40-46,共7页 Foreign Electronic Measurement Technology
基金 四川省自然科学重点项目(18ZA0075,18ZA0073) 乐山市科技局重点研究项目(21GZD015) 成都理工大学工程技术学院基金(C122019027)项目资助。
关键词 大数据 数据聚类 DBSCAN K-means++ 局部搜索 big data data clustering DBSCAN K-means++ local search
  • 相关文献

参考文献8

二级参考文献61

  • 1李茂月,马康盛,王飞,刘硕.基于结构光在机测量的叶片点云预处理方法研究[J].仪器仪表学报,2020,41(8):55-66. 被引量:33
  • 2王瑞康,张国雄.三坐标测量机上实现圆锥度误差测量和评价[J].仪器仪表学报,1993,14(1):1-7. 被引量:6
  • 3张枫,邱保志.基于网格的高效DBSCAN算法[J].计算机工程与应用,2007,43(17):167-169. 被引量:8
  • 4KIM K, AHN H. A recommender system using GA k -means clustering in an online shopping market[J] . ExpertSystems with Applications, 2008, 3 4 (2 ) : 1200 - 1209. 被引量:1
  • 5LASZLO M , MUKHERJEE S. A genetic algorithm that exchangesneighboring centers for k - means clustering [J] .Pattern Recognition Letters, 2007, 2 8 (1 6 ) : 2359 -2 3 6 6 . 被引量:1
  • 6KLEIN R W , DUDES R C. Experiments in projection andclustering by simulated annealing [J]. Pattern Recognition,1989, 2 2 (2 ) : 213 -2 2 0 . 被引量:1
  • 7YANG Y , KAMEL M S. An aggregated clustering approachusing multi - ant colonies algorithms [J] . PatternRecognition, 2006, 3 9 (7 ) : 1278 -1 2 8 9 . 被引量:1
  • 8SHELOKAR P S, JAYARAMAN V K, KULKARNI B D.An ant colony approach for clustering [J] . Analytica ChimicaActa, 2004 , 5 0 9 (2 ) : 187 -1 9 5 . 被引量:1
  • 9CUI X , POTOK T E , PALATHINGAL P. Document clusteringusing particle swarm optimization [C].// Swarm IntelligenceSymposium, 2005. SIS 2005. Proceedings 2005IEEE. IEEE, 2005: 1 8 5 -1 9 1 . 被引量:1
  • 10KAO Y T , ZAHARA E , KAO I W. A hybridized approachto data clustering[J]. Expert Systems with Applications,2008, 3 4 (3 ) : 1754 -1 7 6 2 . 被引量:1

共引文献87

同被引文献78

引证文献7

二级引证文献2

相关作者

内容加载中请稍等...

相关机构

内容加载中请稍等...

相关主题

内容加载中请稍等...

浏览历史

内容加载中请稍等...
;
使用帮助 返回顶部