期刊文献+

基于Spark的推荐系统的设计与实现 被引量:8

Design and Implementation of Recommendation System Based on Spark
下载PDF
导出
摘要 推荐系统是数据挖掘的一个重要部分,能够实现海量数据信息的快速、全面、准确过滤。然而基于以往传统单个主机模式实现的推荐算法其计算过程耗费的时间过长,已经不能满足当前商业时代快速可靠的技术追求。大数据平台Spark分布式计算框架通过引入RDD(弹性分布式数据集)的概念以及基于内存的运算模式,能够更好地适应大数据挖掘这一应用场景。推荐算法在实现过程中存在多次迭代计算,Spark计算框架的使用可以极大提升推荐系统的运算效率。文中利用Spark平台设计了一个基于物品的协同过滤(Item-CF)算法的商品推荐系统,并将其应用在MovieLens数据集上运行测试。实验结果表明,该系统能够提高推荐精确度并降低运算时间。 The recommendation system is an important part of data mining,which can realize the rapid,comprehensive and accurate filtering for a large number of data. However,it takes a lot of time to realize the proposed algorithm based on the traditional single-machinemodel,which cannot meet the fast and reliable business needs in today’s business era. The Spark distributed computing framework of bigdata platform can better adapt to big data mining by introducing the concept of RDD (resilient distributed datasets) and based on memorycomputing mode. The recommendation algorithm has many iterative calculations in the implementation process,and the use of the Sparkcalculation framework can greatly enhance the efficiency of the recommended system. We use the Spark platform to design a product recommendation system based on item-based collaborative filtering (Item-CF) algorithm,which is applied to run a test on the MovieLensdata set. The experiment shows that the system can improve the recommendation accuracy and reduce the operation time.
作者 李星 李涛 LI Xing;LI Tao(School of Communication and Information Technology,Nanjing University of Posts andTelecommunications,Nanjing 210003,China)
出处 《计算机技术与发展》 2018年第10期194-198,共5页 Computer Technology and Development
基金 国家自然科学基金(61572260)
关键词 大数据 Spark平台 推荐系统 协同过滤(CF) 数据挖掘 big data Spark recommendation system collaborative filtering (CF) data mining
  • 相关文献

参考文献6

二级参考文献34

  • 1Takacs G, Pilaszy I, Nemeth B, et al. Matrix factorization and neighbor based algorithms the nettlix prize problem [ C ]//Pro- ceedings of the 2008 ACM conference on recommender sys- tems. Lausanne, Switzerland : ACM, 2008 : 267-274. 被引量:1
  • 2Pilaszy I,Zibriczky D, Tikk D. Fast ALS-basedmatrix factori- zation for explicit and implicit feedback datasets [ C ]//Pro- ceedings of the fourth ACM conference on recommender sys-terns. New York : ACM ,2010:71-78. 被引量:1
  • 3Zhou Yunhong, Wilkinson D, Schreiber R, et al. Large- scale parallel collaborative filtering for the netflix prize [ C ]//Proc of the 4th international conference on algorthmic aspects in in- formation and management. Shanghai: Springer, 2008:337- 348. 被引量:1
  • 4Apache Mahout[ EB/OL]. 2013-12-20. http://mah- out. a- pache, org,/. 被引量:1
  • 5Apache Hadoop[ EB/OL]. 2013-12-20. http://hado- op. a- pache, org. 被引量:1
  • 6Dean J, Ghemawat S. MapReduce:simplified data processing on large clusters [ J]. Communication of the ACM, 2008,51 (1) :107-113. 被引量:1
  • 7Apache HDFS Architecture [ EB/OL ]. 2013 - 12 -20. http :// hadoop, apache, org/common/docs/current/h-dfs_design, ht- ml. 被引量:1
  • 8Hadoop DistributedCache [ EB/OL]. 2013-12-20. http ://ha- doop. apache, org/docs/r0.20.2/api/org/ap- ache/hadoop/ filecache/DistributedCache, html. 被引量:1
  • 9Yingyi B, Bill H, Magdalena B, et al. HaLoop : efficient itera- tive data processing on large clusters [ J ]. Proceedings of the VLDB Endowment ,2010,3 ( 1-2 ) :285-296. 被引量:1
  • 10IDC. The Digital Universe of Opportunities:Rich Data and the Incdreasing Value of the Internet of Things [EB/OL]. [2014-04]. http://www.emc.com/leadership/digital-universe/2014iview/executive-summary.htm. 被引量:1

共引文献285

同被引文献66

引证文献8

二级引证文献36

相关作者

内容加载中请稍等...

相关机构

内容加载中请稍等...

相关主题

内容加载中请稍等...

浏览历史

内容加载中请稍等...
;
使用帮助 返回顶部