摘要
传统k-means算法的聚类中心需要经过多次迭代运算才能最终稳定,而MapReduce计算框架下的k-means聚类算法在处理迭代运算时效率并不理想.针对上述问题,提出一种新的基于MapReduce的k-means聚类算法.该算法对传统k-means算法进行了改进,通过将k-means聚类问题转化为Map和Reduce两阶段的k-means++算法聚类问题,并将权值概念和单通道技术引入到传统k-means++算法中,提升了算法在MapReduce框架中的执行效率.实验分析表明,该方法较之传统方法具有更好的加速比和可扩展性.
The clustering centers of the traditional K-means algorithm need many iterations to be stable, and the efficiency of the K-means clustering algorithm in the MapReduce computing framework is not ideal. In view of the above problems,a new K-means clustering algorithm based on MapReduce is proposed. This algorithm has improved the traditional Kmeans algorithm. By sing-pass method, the K-means clustering problem is transformed into Mapand Reduce two stages of k-mean algorithm clustering problem. And the concept of the weights is introduced into the traditional k-means++ algorithm, which improves the efficiency of the algorithm in the MapReduce framework. Experimental results show that the proposed method is better than the traditional method and has a better speedup and scalability.
作者
郭晨晨
朱红康
GUO Chenchen ZHU Hongkang(School of Mathematics and Computer Science, Shanxi Normal University, Shanxi Linfen 041000, China)
出处
《河北工业大学学报》
CAS
2016年第5期35-43,共9页
Journal of Hebei University of Technology
基金
山西省自然科学基金(2015011040)