期刊文献+

一种基于MapReduce高效K-means并行算法 被引量:3

An efficient K-means parallel algorithm based on MapReduce
下载PDF
导出
摘要 针对K-means算法对初值选取的依赖,收敛速度慢,聚类精度低,以及对海量数据的处理存在内存瓶颈的问题,提出一种基于MapReduce的高效K-means并行算法.该算法在MapReduce框架基础上,结合K选择排序算法进行并行采样,提高采样效率;采用基于样本预处理策略获取初始中心点;使用权值替换策略对迭代中心进行更新;此外,通过调整Hadoop集群,对算法的运行效率作出进一步提升.实验结果表明,该算法不仅具有良好的收敛性、准确率、加速比,算法性能也得到进一步改善. Focusing on the problem of K-means algorithm that has dependence of initial value selection, slow convergence, lower clustering accuracy, slow operating speed and overflow memory when dealing with large data, an efficient K-means parallel algorithm based on Map Reduce is proposed. Firstly, the algorithm is based on the Map Reduce framework, and combined with K selective sorting algorithm to improve the sampling efficiency; Secondly, the initial center point is obtained based on the sample pretreatment strategy; Finally, the iterative center is updated by using the weight replacement policy; In addition, by adjusting the Hadoop cluster, the efficiency of the algorithm is further enhanced. Experimental results show that the proposed algorithm not only has good convergence, accuracy and speedup, but also can improve performance of the algorithm.
作者 王永贵 崔鹏
出处 《辽宁工程技术大学学报(自然科学版)》 CAS 北大核心 2017年第11期1204-1211,共8页 Journal of Liaoning Technical University (Natural Science)
基金 国家自然科学基金(61404069) 辽宁省科技厅博士启动基金(20141140)
关键词 K-MEANS MAPREDUCE HADOOP 并行采样 K选择排序 k-means hadoop mapreduce parallel sampling K selection sort
  • 相关文献

参考文献5

二级参考文献50

共引文献81

同被引文献19

引证文献3

二级引证文献43

相关作者

内容加载中请稍等...

相关机构

内容加载中请稍等...

相关主题

内容加载中请稍等...

浏览历史

内容加载中请稍等...
;
使用帮助 返回顶部