期刊文献+

大数据下基于Spark的电商实时推荐系统的设计与实现 被引量:22

Design and Implement of E-Commerce Real-Time Recommender System with Spark Based on Big Data
下载PDF
导出
摘要 大数据下基于Hadoop平台构建的电商推荐系统存在着计算缓慢、无法根据用户实时行为作出推荐的问题。针对以上问题,设计和实现基于Spark平台的电商实时推荐系统。与Hadoop平台构建的推荐系统相比,系统首先基于Spark平台构建了分布式日志采集模块和分布式日志数据传输模块,用于采集和传输用户隐式行为日志,解决电子商务跨系统数据源收集问题;其次在统一数据源的基础上,采用基于Spark的矩阵分解推荐模型进行离线训练,提升离线推荐训练的效率;进而在离线推荐的基础上,提出一种使用Spark Streaming实时流技术对电商日志数据做实时过滤,获取用户当前所需商品,并将离线推荐结果与实时推荐结果通过统一介质融合的方案,实现对用户隐式行为进行实时推荐反馈的功能。最后经实验证明,基于Spark平台的电商实时推荐系统相对于Hadoop平台的电商推荐系统具有更高的可靠性和稳定性,能够承载大规模数据量,离线推荐训练速度相对于Hadoop平台提高10倍,并且对用户的实时行为也能够作出实时推荐反馈,提升5%的交易转化率,增强电商网站的用户体验。 Concerns the problem that the e-commerce recommendation system which based on Hadoop platform has low computing speed and can't make recommendation based on real-time user behavior. In order to solve the problem, designs real-time e-commerce recommendation system which is based on Spark platform. What is different from the previous system is that distributed log collection module and dis- tributed log data transmission module are designed to collect and transfer log data of implicit user behavior, which solves the problem of collecting the log data come from different system. On the basis of a unified data source, the matrix decomposition model based on Spark is used to do off-line training and Spark streaming is used to do real-time log filtering to get the most similar goods to the good which in- cluded in the log. The result of real-time recommendation and off-line recommendation is merged in the system as feedback to the real- time user behavior. The experimental results show that the system which can carry massive amounts of data has the higher reliability and stability than the system which is based on Hadoop, the training speed of the off-line recommendation is 10 times as fast as that of the Hadoop platform, can make real-time recommended feedback to real-time user behavior which increase the user experience and the percent conversion of trade can be increased 5%.
出处 《现代计算机》 2016年第16期61-69,共9页 Modern Computer
基金 国家自然科学基金(No.61562056) 教育部人文社科青年基金资助项目(No.13YJC630210) 2014年上海市科技型技术创新基金项目(No.1401H164800) 上海市杨浦区国家创新型试点城区建设与管理专项资金项目(No.2015YPCX03-002)
关键词 大数据 Spark平台 HADOOP平台 实时推荐 用户隐式行为 Big-Data Spark Platform Hadoop Platform Real-Time Recommendation Implicit User Behavior
  • 相关文献

参考文献15

  • 1IDC. The Digital Universe of Opportunities:Rich Data and the Incdreasing Value of the Internet of Things [EB/OL]. [2014-04]. http://www.emc.com/leadership/digital-universe/2014iview/executive-summary.htm. 被引量:1
  • 2FERRERIA C R L , Traina J C, MACHADO T A J, et al. Clustering Very Large Multi-Dimensional Datasets with Mapreduce [C]. 17th ACM SIGKDD International Conference on Knowledge Discovery and Data Mining, 2011 ACM. San Diego: ACM Press, 2011: 690-698. 被引量:1
  • 3江小平,李成华,向文,张新访.云计算环境下朴素贝叶斯文本分类算法的实现[J].计算机应用,2011,31(9):2551-2554. 被引量:21
  • 4刘义,景宁,陈荦,熊伟.MapReduce框架下基于R-树的k-近邻连接算法[J].软件学报,2013,24(8):1836-1851. 被引量:60
  • 5YU Y, HUANG C, LEE Y. An Intelligent Touring System Based on Mobile Social Network and Cloud Computing for Travel Recom- mendation[C]. 28th International Conference on Advanced Information Networking and Applications Workshops(AINA), 2014 IEEE. Victoria, Canada: IEEE Press, 2014:19-24. 被引量:1
  • 6WALUNJ S G, SADAFALE K. An Online Recommendation System for E-commerce Based on Apache Mahout Framework[C]. 2013 Annual Conference on Computers and People Research, 2013 ACM. Cincinnati: ACM Press,2013: 153-158. 被引量:1
  • 7ZAHARIA M, CHOWDHURY M, FRANKLIN M J, et al. Spark: Cluster Computing with Working Sets[C]. Proceedings of the 2nd USENIX Conference on Hot Topics in Cloud Computing , 2010:10-10. 被引量:1
  • 8ZAHARIA M, CHOWDHURY M, DAS T, et al. Resilient Distributed Datasets: A Fault-Tolerant Abstraction for in-Memory Cluster Computing[C]. Proceedings of the 9th USENIX Conference on Networked Systems Design and Implementation. USENIX Association, 2012:2-2. 被引量:1
  • 9X.LU,M.W.U. RAHMAN, N. ISLAM, D. SHANKAR. Accelerating Spark with RDMA for Big Data Processing: Early Experiences[C]. Proceedings of the 22nd Annual Symposium on High-Performance Interconnects.2010:9-16. 被引量:1
  • 10Rong-Zhi Qi,Zhi-Jian Wang,Shui-Yan Li.A Parallel Genetic Algorithm Based on Spark for Pairwise Test Suite Generation[J].Journal of Computer Science & Technology,2016,31(2):417-427. 被引量:12

二级参考文献145

  • 1汪卫,周皓峰,袁晴晴,楼宇波,施伯乐.基于图论的频繁模式挖掘[J].计算机研究与发展,2005,42(2):230-235. 被引量:17
  • 2DEAN J, GHEMAWAT S. MapReduce: simplified data processing on large clusters [ J] // Communications of the ACM: 50th anniversary issue, 2008, 51(1): 107-113. 被引量:1
  • 3Apache Hadoop. Hadoop[ EB/OL]. [2011-03- 15]. http://hadoop. apache, org. 被引量:1
  • 4CHU C-T, KIM S K, LIN Y-A, et al. Map-reduce for machine learning on multicore[ C]// NIPS 2006: Proceedings of Neural Information Processing Systems Conference. Cambridge, MA: MIT, 2006:281-288. 被引量:1
  • 5JASON D, LAWRENCE S, JAIME T, et al. Tracking the poor assumptions of Naive Bayes text classifiers[ C]// ICML 2003: Proceedings of the Twenty International Conference on Machine Learning. Washington, DC: [s. n. ], 2003:616-693. 被引量:1
  • 6中国科学院计算技术研究所.ICTCLAS汉语分词系统【EB/OL】.[2011-02—16】.http://ictclas.org/. 被引量:5
  • 7University of Waikato. Weka 3: data mining software in Java [ EB/ OL]. [2011 -03 - 15]. http://www, cs. waikato, ac. nz/ml/weka/. 被引量:1
  • 8WEGENER D, MOCK M, ADRANALE D, et al. Toolkit-based high-performance data mining of large data on MapReduce clusters [ C]// ICDM: IEEE International Conference on Data Mining. Washington, DC: IEEE Computer Society, 2009:296 -301. 被引量:1
  • 9MIT Computer Science and Artificial Intelligence Laboratory. Twenty news groups dataset[ EB/OL]. (2008 -01 - 14) [2011 -02 - 18]. http://people, csail, mit. edu/jrennie/20Newsgroups/. 被引量:1
  • 10搜狗实验室.互联网语料库【EB/OL】.【2011—02—17].www.sogou.com/labs/dl/t.html. 被引量:1

共引文献147

同被引文献156

引证文献22

二级引证文献65

相关作者

内容加载中请稍等...

相关机构

内容加载中请稍等...

相关主题

内容加载中请稍等...

浏览历史

内容加载中请稍等...
;
使用帮助 返回顶部