期刊文献+

基于MapReduce和分布式缓存的KNN分类算法研究 被引量:2

Parallelized K-nearest neighbor algorithm based on MapReduce and distributed cache
下载PDF
导出
摘要 随着大数据时代的到来,K最近邻(KNN)算法较高的计算复杂度的弊端日益凸显。在深入研究了KNN算法的基础上,结合Map Reduce编程模型,利用其开源实现Hadoop,提出了一种基于Map Reduce和分布式缓存机制的KNN并行化方案。该方案只需要通过Mapper阶段就能完成分类任务,减少了Task Tracker与Job Tracker之间的通信开销,同时也避免了Mapper的中间结果在集群任务节点之间的通信开销。通过在Hadoop集群上实验,验证了所提出的并行化KNN方案有着优良的加速比和扩展性。 With the advent of the era of big data, K-nearest neighbor algorithm's shortcoming which high computational complexity is become more and more seriously. Through the use of distributed cache mechanism and Hadoop programming ideas provided, this paper proposed KNN parallelization scheme based on the MapReduce. The program only needs to complete classification tasks by Mapper stage. It reduced the communication overhead between the TaskTracker and JobTraeker; on the other hand, it avoided the intermediate results Mapper overhead communication and information transfer between nodes in the cluster task. Through experiments on a Hadoop cluster, the proposed parallel KNN has a better speedup and sealability.
出处 《微型机与应用》 2015年第2期18-21,共4页 Microcomputer & Its Applications
关键词 KNN分类算法 并行化 MapReduce编程模型 HADOOP 分布式缓存 K-nearest neighbor algorithm parallelization MapReduce hadoop distributed cache
  • 相关文献

参考文献8

  • 1SAMET H.The design and analysis of spatial data structures[M].MA:Addison-Wesley,1990. 被引量:1
  • 2FRANKLIN M,HALEVY A,MAIER D.A first tutorial on dataspaces[J].Proceedings of the VLDB Endowment,2008,1(2):1516-1517. 被引量:1
  • 3刘莉,郭艳艳,吴扬扬.一种基于基本信息单元的索引[J].计算机工程与科学,2011,33(9):117-122. 被引量:4
  • 4DEAN J,GHENAWAT S.Map Reduce:simplified data processing on large clusters[J].Communications of the ADM-50th Anniversary Issue:1958-2008,2008,51(1):107-113. 被引量:1
  • 5COVER T,HART P.Nearest neighbor pattern classification[J].IEEE Transactions on Information Theory,1967,13(1):21-27. 被引量:1
  • 6李航著..统计学习方法[M].北京:清华大学出版社,2012:235.
  • 7TOM W.Hadoop:the definitive guide(second editon)[M].O′Reilly Media,Inc.,2011. 被引量:1
  • 8闫永刚,马廷淮,王建.KNN分类算法的MapReduce并行化实现[J].南京航空航天大学学报,2013,45(4):550-555. 被引量:21

二级参考文献9

共引文献23

同被引文献15

引证文献2

二级引证文献7

相关作者

内容加载中请稍等...

相关机构

内容加载中请稍等...

相关主题

内容加载中请稍等...

浏览历史

内容加载中请稍等...
;
使用帮助 返回顶部