摘要
在大数据背景下,为了提高算法的并行度,论文提出了一种基于MapReduce的垂直FP-growth挖掘算法,将MapReduce模式和传统的挖掘算法相结合。首先由Map函数对事物数据库项进行解析,Reduce函数对频繁项的支持度进行计算并对全局频繁树进行合并,从而使垂直FP-growth算法的迭代过程并行化;然后,通过全局频繁项的计算得到准确的频繁项集和关联规则。最后,通过实验验证论文所提算法不仅能够保持原FP-growth算法的准确度,而且在大数据处理中具有较高的集群性能和执行效率。
In big data environment,to achieve a high degree of parallel,a vertical FP-growth mining algorithm based on MapReduce framework is proposed in this paper which combined the traditional mining algorithm with MapReduce mode.Firstly,the map functions analyze items in transaction database,reduce functions calculated frequent item set support and combine the global frequent trees.This procedure parallels the iterative process of vertical FP-growth algorithm.Secondly,via computing the global frequent items,the accurate frequent item sets and association rules are obtained.Finally,the experimental results show that the proposed algorithm not only maintains the accuracy of the original FP-growth algorithm,but also has higher execution efficiency and better clustering performance.
作者
王嵘冰
徐红艳
魏莲莲
WANG Rongbing;XU Hongyan;WEI Lianlian(School of Information,Liaoning University,Shenyang 110036)
出处
《计算机与数字工程》
2018年第7期1284-1287,1296,共5页
Computer & Digital Engineering
基金
辽宁省博士科研启动基金(编号:201601099)
辽宁省社科规划项目(编号:L14DGL049)资助