摘要
推荐系统是数据挖掘的一个重要部分,能够实现海量数据信息的快速、全面、准确过滤。然而基于以往传统单个主机模式实现的推荐算法其计算过程耗费的时间过长,已经不能满足当前商业时代快速可靠的技术追求。大数据平台Spark分布式计算框架通过引入RDD(弹性分布式数据集)的概念以及基于内存的运算模式,能够更好地适应大数据挖掘这一应用场景。推荐算法在实现过程中存在多次迭代计算,Spark计算框架的使用可以极大提升推荐系统的运算效率。文中利用Spark平台设计了一个基于物品的协同过滤(Item-CF)算法的商品推荐系统,并将其应用在MovieLens数据集上运行测试。实验结果表明,该系统能够提高推荐精确度并降低运算时间。
The recommendation system is an important part of data mining,which can realize the rapid,comprehensive and accurate filtering for a large number of data. However,it takes a lot of time to realize the proposed algorithm based on the traditional single-machinemodel,which cannot meet the fast and reliable business needs in today’s business era. The Spark distributed computing framework of bigdata platform can better adapt to big data mining by introducing the concept of RDD (resilient distributed datasets) and based on memorycomputing mode. The recommendation algorithm has many iterative calculations in the implementation process,and the use of the Sparkcalculation framework can greatly enhance the efficiency of the recommended system. We use the Spark platform to design a product recommendation system based on item-based collaborative filtering (Item-CF) algorithm,which is applied to run a test on the MovieLensdata set. The experiment shows that the system can improve the recommendation accuracy and reduce the operation time.
作者
李星
李涛
LI Xing;LI Tao(School of Communication and Information Technology,Nanjing University of Posts andTelecommunications,Nanjing 210003,China)
出处
《计算机技术与发展》
2018年第10期194-198,共5页
Computer Technology and Development
基金
国家自然科学基金(61572260)