期刊文献+

基于MapReduce并行化计算的大数据聚类算法 被引量:20

Parallel computation algorithm for big data clustering based on MapReduce
下载PDF
导出
摘要 面对大数据规模庞大且计算复杂等问题,基于MapReduce框架采用两阶段渐进式的聚类思想,提出了改进的K-means并行化计算的大数据聚类方法。第一阶段,该算法通过Canopy算法初始化划分聚类中心,从而迅速获取粗精度的聚类中心点;第二阶段,基于MapReduce框架提出了并行化计算方案,使每个数据点围绕其邻近的Canopy中心进行细化的聚类或合并,从而对大数据实现快速、准确地聚类分析。在MapReduce并行框架上进行算法验证,实验结果表明,所提算法能够有效地提升并行计算效率,减少计算时间,并提升大数据的聚类精度。 Aiming at solving the problem of big data’s large scale and complex computation,this paper adopted the idea of two-stage progressive clustering,and proposed a parallel computation algorithm for big data clustering based on MapReduce.In the first stage,this method acquired the initialized clustering center through Canopy algorithm,in order to find relatively accurate cluster center points quickly.In the second stage,it presented a novel scheme of parallel computation based on MapReduce framework,which maked each data node cluster or merge around its adjacent Canopy center node.In this way,the algorithm could make the procedure of data clustering fast and accurately.The results of the experiments deployed on MapReduce show that this algorithm can effectively improve the efficiency of parallel computing,reduce computing time,and improve big data’s clustering accuracy.
作者 张文杰 蒋烈辉 Zhang Wenjie;Jiang Liehui(Faculty of Cyberspace Security,PLA Information Engineering University,Zhengzhou 450001,China;State Key Laboratory Mathematical Engineering&Advanced Computing,Zhengzhou 450001,China)
出处 《计算机应用研究》 CSCD 北大核心 2020年第1期53-56,共4页 Application Research of Computers
基金 河南省基础前沿课题(142300410090) 河南省科技攻关计划项目(162102210035).
关键词 大数据 MAPREDUCE 并行计算 数据聚类 big data MapReduce parallel computation data clustering
  • 相关文献

参考文献13

二级参考文献76

共引文献912

同被引文献208

引证文献20

二级引证文献45

相关作者

内容加载中请稍等...

相关机构

内容加载中请稍等...

相关主题

内容加载中请稍等...

浏览历史

内容加载中请稍等...
;
使用帮助 返回顶部