期刊文献+

K-Means算法改进及基于Spark计算模型的实现 被引量:11

Improvement of K-Means algorithm and implementation based on Spark computing model
下载PDF
导出
摘要 K-Means算法是一种基于划分的算法,具有实现简单、效率较高的特点,但存在对初始中心选取依赖性强、分类数K未必总是已知及算法频繁迭代资源开销大等缺点。为解决这些问题,通过引入Canopy算法和最小最大距离算法对原K-Means算法进行改进,并在大数据的现实背景下,采用Spark并行计算框架来实现该算法。实验结果表明:改进后的聚类算法在分类稳定性、准确性和收敛速度上都有所提升,并在处理大规模数据方面表现出较大的性能优势。 The K-Means algorithm is a partition-based algorithm with numerous advantages of simple and high efficiency. But the algorithm has a strong dependence on the selection of initial center. What's more,the number of classes is not always known and frequent iterations can result in the overload of server. To solve these problems,the original K-Means algorithm is improved by introducing Canopy algorithm and minimum maximum distance algorithm. In order to deal with big data,the Spark computing model is utilized to improve the algorithm. Experimental results show that the improved clustering algorithm can improve the classification stability,the accuracy and the convergence speed,thus having performance advantages in dealing with big data.
作者 徐鹏程 王诚
出处 《南京邮电大学学报(自然科学版)》 北大核心 2017年第4期113-118,共6页 Journal of Nanjing University of Posts and Telecommunications:Natural Science Edition
关键词 K-MEANS Canopy算法 最小最大距离算法 SPARK K-Means Canopy algorithm minimum maximum distance algorithm Spark
  • 相关文献

参考文献5

二级参考文献87

共引文献34

同被引文献84

引证文献11

二级引证文献82

相关作者

内容加载中请稍等...

相关机构

内容加载中请稍等...

相关主题

内容加载中请稍等...

浏览历史

内容加载中请稍等...
;
使用帮助 返回顶部