摘要
大数据下基于Hadoop平台构建的电商推荐系统存在着计算缓慢、无法根据用户实时行为作出推荐的问题。针对以上问题,设计和实现基于Spark平台的电商实时推荐系统。与Hadoop平台构建的推荐系统相比,系统首先基于Spark平台构建了分布式日志采集模块和分布式日志数据传输模块,用于采集和传输用户隐式行为日志,解决电子商务跨系统数据源收集问题;其次在统一数据源的基础上,采用基于Spark的矩阵分解推荐模型进行离线训练,提升离线推荐训练的效率;进而在离线推荐的基础上,提出一种使用Spark Streaming实时流技术对电商日志数据做实时过滤,获取用户当前所需商品,并将离线推荐结果与实时推荐结果通过统一介质融合的方案,实现对用户隐式行为进行实时推荐反馈的功能。最后经实验证明,基于Spark平台的电商实时推荐系统相对于Hadoop平台的电商推荐系统具有更高的可靠性和稳定性,能够承载大规模数据量,离线推荐训练速度相对于Hadoop平台提高10倍,并且对用户的实时行为也能够作出实时推荐反馈,提升5%的交易转化率,增强电商网站的用户体验。
Concerns the problem that the e-commerce recommendation system which based on Hadoop platform has low computing speed and can't make recommendation based on real-time user behavior. In order to solve the problem, designs real-time e-commerce recommendation system which is based on Spark platform. What is different from the previous system is that distributed log collection module and dis- tributed log data transmission module are designed to collect and transfer log data of implicit user behavior, which solves the problem of collecting the log data come from different system. On the basis of a unified data source, the matrix decomposition model based on Spark is used to do off-line training and Spark streaming is used to do real-time log filtering to get the most similar goods to the good which in- cluded in the log. The result of real-time recommendation and off-line recommendation is merged in the system as feedback to the real- time user behavior. The experimental results show that the system which can carry massive amounts of data has the higher reliability and stability than the system which is based on Hadoop, the training speed of the off-line recommendation is 10 times as fast as that of the Hadoop platform, can make real-time recommended feedback to real-time user behavior which increase the user experience and the percent conversion of trade can be increased 5%.
出处
《现代计算机》
2016年第16期61-69,共9页
Modern Computer
基金
国家自然科学基金(No.61562056)
教育部人文社科青年基金资助项目(No.13YJC630210)
2014年上海市科技型技术创新基金项目(No.1401H164800)
上海市杨浦区国家创新型试点城区建设与管理专项资金项目(No.2015YPCX03-002)