期刊文献+

基于Spark的PFP-Growth并行算法优化实现 被引量:6

Optimization of parallel FP-Growth algorithm based on Spark
下载PDF
导出
摘要 随着数据量的增大,FP-Growth算法压缩数据思想的优势就体现出来,基于MapReduce框架的PFP-Growth算法实现该算法在Hadoop平台上的并行化,但是MapReduce框架每次对作业进行操作都要将中间结果输出存储到磁盘,影响算法的效率。为了提高关联挖掘的效率,基于Spark平台,运用均衡分组的思想对该算法进行改进,同时在对具有很长前缀情况进行共享前缀的拆分,通过4个步骤使IPFP-Growth算法在Spark上实现。实验结果表明在Spark平台上优化过后的算法在性能上要优于PFP-Growth算法。 The advantage of the FP-Growth algorithm for compressing data is reflected with the increasing of the data size.With the MapReduce framework,the PFP-Growth algorithm can be parallelized on the Hadoop platform. However,when processing tasks with the MapReduce framework,the intermediate results need to be written to the disk,which will affect the efficiency of the algorithm. Therefore,based on Spark platform,this algorithm was improved according to the concept of balanced grouping to improve the efficiency of association mining. In addition,if there is a long prefix,the improved algorithm will split the shared prefix. The IPFP-Growth is implemented in Spark through four steps. The experimental results show that the performance of the algorithm optimized in Spark is superior to that of the PFP-Growth algorithm.
作者 方向 张功萱
出处 《现代电子技术》 北大核心 2016年第8期9-13,共5页 Modern Electronics Technique
基金 江苏省973项目(BK2011022) 国家自然科学基金重点项目(612724420)
关键词 并行化 SPARK 关联挖掘 PFP-Growth parallelization Spark association mining PFP-Growth
  • 相关文献

参考文献10

  • 1HAN J, PEI J, YIN Y. Mining frequent patterns without candi- date generation [J]. ACM SIGMOD record, 2000, 29(2): 1-12. 被引量:1
  • 2LI H, WANG Y, ZHANG D, et al. PFP: parallel FP-Growth for query recommendation [C]// Proceedings of the 2008 ACM Con- ference on Recommender Systems. rS.I.I: ACM. 2008; 107-114+. 被引量:1
  • 3王智钢,王池社,马青霞.分布式并行关联规则挖掘算法研究[J].计算机应用与软件,2013,30(10):113-115. 被引量:13
  • 4曾志勇,杨呈智,陶冶.负载均衡的FP-growth并行算法研究[J].计算机工程与应用,2010,46(4):125-126. 被引量:10
  • 5周诗慧..基于Hadoop的改进的并行Fp-Growth算法[D].山东大学,2013:
  • 6吕雪骥,李龙澍.FP-Growth算法MapReduce化研究[J].计算机技术与发展,2012,22(11):123-126. 被引量:18
  • 7杨雅双..关联规则的并行挖掘算法研究[D].西安科技大学,2010:
  • 8ZHOU L, ZHONG Z, CHANG J, et al. Balanced parallel FP- Growth with MapReduce [C]// Proceedings of 2011 IEEE Youth Conference on Information Computing and Telecommunica- tions. [S.1.]: IEEE, 2010: 243-248. 被引量:1
  • 9高彦杰著..Spark大数据处理 技术、应用与性能优化[M].北京:机械工业出版社,2014:255.
  • 10QIU H, GU R, YUAN C, et al. YAFIM: A parallel frequent itemset mining algorithm with Spark [C]// 2014 IEEE Interna- tional Parallel & Distributed Processing Symposium Work- shops. [S.1.]: IEEE, 2014: 1664-1671. 被引量:1

二级参考文献28

  • 1谈克林,孙志挥.一种FP树的并行挖掘算法[J].计算机工程与应用,2006,42(13):155-157. 被引量:10
  • 2Agrawal R,Imielinski T,Swami A.Mining association rules between sets of items in large databases[C]//Proceedings of the ACM SIG- MOD International Conference Management of Date,Washington,1993:207-216. 被引量:1
  • 3Ha n J,Kamber M.Data mining:Concepts and techniques[M].Beijing: High Education Press, 2001. 被引量:1
  • 4Agrawal R,Imielinski T,Swami A.Mining association rules between sets of items in large databases[C]//Proc 1993 ACM-SIGMOD Int Conf Management of Data, Washington, DC, May 1993 : 207-216. 被引量:1
  • 5Savasere A,Omiecinski E,Navathe S.An efficient algorithm for mining association rules in large databases[C]//Proc of the 21st VLDB Conference, Zurich, Switzerland, 1995 : 432-443. 被引量:1
  • 6Agrawal R C,Agarwal C,Prasad V V V.A tree projection algorithm for generation of frequent itemscts[J].Journal of Parallel and Distributed Computing:Special Issue on High Performance Data Mining, 2000. 被引量:1
  • 7Han J W,Pei J,Yin Y W,et al.Mining frequent patterns without candidate generation[C]//Proc ACM-SIGMOD Int Conf on Management of Data(SIGMOD'00),DalIas,TX,2000:1-12. 被引量:1
  • 8Han J,Pei J,Yin Y,et al.Mining frequent patterns without candidate generation : A frequent-pattern tree approach[J].Data Mining and Knowledge Discovery, 2004,8 : 53-87. 被引量:1
  • 9Han Jiawei, Pei Jian, ytin Yiwen. Mining frequent patterns without candidate generation [ C ]//SIGMOD' 00. [ s. 1.] :[ s. n. ] ,2000. 被引量:1
  • 10Agrawal R, Shafer J C. Parallel mining of association rules[J].1EEE Transactions on knowledge and date engineering, 1996,8(6) :962-969. 被引量:1

共引文献34

同被引文献42

引证文献6

二级引证文献55

相关作者

内容加载中请稍等...

相关机构

内容加载中请稍等...

相关主题

内容加载中请稍等...

浏览历史

内容加载中请稍等...
;
使用帮助 返回顶部