摘要
随着信息技术的发展,各个行业都积累了海量数据,并呈指数级增长趋势。如何挖掘出有用的数据来提供更好的服务显得尤为重要。论文借助MapReduce编程模型在处理海量数据方面的优势,结合KNN算法自身的特点设计相对应的Map和Reduce函数,实现KNN算法的MapReduce并行化。实验结果表明,较之传统的KNN串行算法,基于MapReduce的并行KNN算法具有较好的扩展性和加速比。
With the development of information technology in various fields has accumulated huge amounts of data, and exponential growth. How to dig out useful data to provide better service is particularly important. With the help of MapReduce programming model in dealing with massive data advantages, combined with its own characteristics of KNN algorithm design Map and Reduce functions to achieve KNN algorithm's MapReduce parallelism. Experimental results show that, compared with the traditional KNN serial algorithm, parallel KNN algorithm based on MapReduce has better scalahility and speedup.
出处
《计算机与数字工程》
2013年第11期1738-1740,共3页
Computer & Digital Engineering