期刊文献+

一种基于MapReduce的改进k-means聚类算法研究 被引量:2

An improved k-means clustering algorithm based on MapReduce
下载PDF
导出
摘要 传统k-means算法的聚类中心需要经过多次迭代运算才能最终稳定,而MapReduce计算框架下的k-means聚类算法在处理迭代运算时效率并不理想.针对上述问题,提出一种新的基于MapReduce的k-means聚类算法.该算法对传统k-means算法进行了改进,通过将k-means聚类问题转化为Map和Reduce两阶段的k-means++算法聚类问题,并将权值概念和单通道技术引入到传统k-means++算法中,提升了算法在MapReduce框架中的执行效率.实验分析表明,该方法较之传统方法具有更好的加速比和可扩展性. The clustering centers of the traditional K-means algorithm need many iterations to be stable, and the efficiency of the K-means clustering algorithm in the MapReduce computing framework is not ideal. In view of the above problems,a new K-means clustering algorithm based on MapReduce is proposed. This algorithm has improved the traditional Kmeans algorithm. By sing-pass method, the K-means clustering problem is transformed into Mapand Reduce two stages of k-mean algorithm clustering problem. And the concept of the weights is introduced into the traditional k-means++ algorithm, which improves the efficiency of the algorithm in the MapReduce framework. Experimental results show that the proposed method is better than the traditional method and has a better speedup and scalability.
作者 郭晨晨 朱红康 GUO Chenchen ZHU Hongkang(School of Mathematics and Computer Science, Shanxi Normal University, Shanxi Linfen 041000, China)
出处 《河北工业大学学报》 CAS 2016年第5期35-43,共9页 Journal of Hebei University of Technology
基金 山西省自然科学基金(2015011040)
关键词 K-MEANS MAPREDUCE 两阶段 单通道 并行化 加速比 k-means MapReduce two stages single pass parallelization speedup
  • 相关文献

参考文献2

二级参考文献17

  • 1Han J W, Kamber M. Data mining: concepts and techniques [M]. San Francisco, US: Morgan Kaufmann, 2001. 被引量:1
  • 2Buyya R, Yeo C S, Venugopal S. Market-oriented cloud computing: vision,hype, and reality for delivering IT services as computing utilities, Keynote Paper [C] // Proceedings of the 10th IEEE International Conference on High Performance Computing and Communications. Dalian, China, 2009 :25-27. 被引量:1
  • 3Armbrust M, Fox A. Above the clouds: a Berkeley view of cloud computing[R]. USA: University of California at Berkeley, 2009. 被引量:1
  • 4Erdogmus H. Cloud computing., does nirvana hide behind the nebula[J]. IEEE Software, 2009,26 (2) : 4-6. 被引量:1
  • 5Ghemawat S,Gobioff H, Leung S. The google file system[J].S ACM SIGOPS Operating Systems Review, 2003,37 (5) : 29-43. 被引量:1
  • 6Dean J, Ghemawat S. MapReduce: simplified data processing on large clusters [C] /// Proceedings of Operating Systems Design and Implementation. San Franciseo, CA, 2004 : 137-150. 被引量:1
  • 7Xu X W, Jager J, Kriegel H P. A fast parallel clustering algorithm for large spatial databases[J]. Data Mining and Knowledge Discovery,1999,3(3) :263-290. 被引量:1
  • 8郑纬民.云计算的大幕已经拉开.中国计算机学会通讯,2009,5(6):6-7. 被引量:4
  • 9Apache Hadoop.Hadoop. http:∥hadoop.apache.org . 2011 被引量:1
  • 10Wikipedia.k-means clustering. http:∥en.wikipedia.org/wiki/k-means-cluste-ring . 2011 被引量:1

共引文献140

同被引文献27

引证文献2

二级引证文献17

相关作者

内容加载中请稍等...

相关机构

内容加载中请稍等...

相关主题

内容加载中请稍等...

浏览历史

内容加载中请稍等...
;
使用帮助 返回顶部