摘要
针对交替最小二乘法(ALS)在处理大数据集时所面临的处理速度和计算资源问题,提出了基于相似用户索引的分布式矩阵分解推荐算法。首先算法基于用户的评分行为找到用户之间的最近邻,然后使用Spark平台运行提出的算法,并产生推荐。在GroupLens网站上提供的MovieLens数据集上进行仿真实验,实验结果表明,提出的算法能够有效解决ALS对于大数据集运行效率低及在云环境中可扩展性较差的问题。
In order to solve the bottleneck problems of processing speed and resource allocation of Alternating Least Squares (ALS), a distributed parallel matrix factorization recommendation approach with similar user index was proposed. First, the approach found nearest neighbors among the users based on their ratings; Then, Spark was employed to implement the proposed approach, and the recommendation to the user is produced. Simulate experiments in MovieLens datasets provided by GroupLens website show that the proposed algorithm can resolve the issue of low execution efficiency of ALS for large-scale datasets and the worse scalability in clouds.
出处
《陕西理工学院学报(自然科学版)》
2016年第6期47-52,共6页
Journal of Shananxi University of Technology:Natural Science Edition
基金
云南省教育厅科学研究基金资助项目(2014Y145)
关键词
交替最小二乘法
最近邻
推荐算法
SPARK
alternating least squares
nearest neighbors
recommendation algorith
Spark