期刊文献+

基于MapReduce的并行频繁项集挖掘算法研究 被引量:7

Research on parallel frequent itemset mining algorithm based on MapReduce
下载PDF
导出
摘要 针对并行MRPrePost(parallel prepost algorithm based on MapReduce)频繁项集挖掘算法在大数据环境存在运行时间长、内存占用量大和节点负载不均衡的问题,提出一种基于DiffNodeset的并行频繁项集挖掘算法(parallel frequent itemsets mining using DiffNodeset,PFIMD)。该算法首先采用一种数据结构DiffNodeset,有效地避免了N-list基数过大的问题;此外提出一种双向比较策略(2-way comparison strategy,T-wcs),以减少两个DiffNodeset在连接过程中的无效计算,极大地降低了算法时间复杂度;最后考虑到集群负载对并行算法效率的影响,进一步提出了一种基于动态分组的负载均衡策略(load balancing strategy based on dynamic grouping,LBSBDG),该策略通过将频繁1项集F-list中的每项进行均匀分组,降低了集群中每个计算节点上PPC-Tree树的规模,进而减少了先序后序遍历PPC-Tree树所需的时间。实验结果表明,该算法在大数据环境下进行频繁项集挖掘具有较好的效果。 Aiming at the problem of excessive time,space complexity and unbalanced load for each node based on the parallel frequent itemset mining algorithm MRPrePost,this paper proposed an optimization parallel frequent itemset mining algorithm based on MapReduce,named PFIMD.Firstly,this algorithm adopted a data structure called DiffNodeset,which effectively avoided the defect that the N-list cardinality got very large in the MRPrePost algorithm.Secondly,in order to reduce the time complexity of this algorithm,it designed the T-wcs to avoid the invalid calculation in the procession of two DiffNodesets connection.Finally,considering the impact of cluster load on the efficiency of parallel algorithm,it proposed the LBSBDG,which decreased the size of PPC-Tree on each computing node and reduced the amount of time required to traverse the PPC-Tree by evenly grouping each item in the F-list.The experimental results show that the modified algorithm has better performance on mining frequent itemset in a big data environment.
作者 刘卫明 张弛 毛伊敏 Liu Weiming;Zhang Chi;Mao Yimin(School of Information Engineering Jiangxi University of Science&Technology,Ganzhou Jiangxi 341099,China)
出处 《计算机应用研究》 CSCD 北大核心 2021年第3期689-695,共7页 Application Research of Computers
基金 国家自然科学基金资助项目(41562019) 国家重点研发计划资助项目(2018YFC1504705)。
关键词 DiffNodeset数据结构 MAPREDUCE T-wcs策略 LBSBDG策略 频繁项集挖掘 DiffNodeset structure MapReduce 2-way comparison strategy load balancing strategy based on dynamic grouping frequent item mining
  • 相关文献

参考文献14

二级参考文献213

  • 1施亮,钱雪忠.基于Hadoop的并行FP-Growth算法的研究与实现[J].微电子学与计算机,2015,32(4):150-154. 被引量:15
  • 2张尧学.透明计算:概念、结构和示例[J].电子学报,2004,32(F12):169-174. 被引量:48
  • 3邹翔,张巍,刘洋,蔡庆生.分布式序列模式发现算法的研究[J].软件学报,2005,16(7):1262-1269. 被引量:19
  • 4Dean J, Ghemmawat S. MapReduce: simplied data processing on large clusters [ C ]//Proceedings of the 6th Sympesium on Operating System Design and Implementation. New York: ACM Press, 2004:137 -150. 被引量:1
  • 5Ranger C, Raghuraman R, Penmetsa A. Evaluating MapReduce for multicore and mutiprocessor systems [ C ] //Proceedings of the 2007 IEEE 13th International Symposium on High Performance Computer Architecture. Washington: IEEE Computer Society, 2007 : 13 -24. 被引量:1
  • 6Kruuf M D, Sankaralinggam K. MapReduce for the cell B.E. architecture [ R ]. Madison: University of Wisconsin - Madison, 2007. 被引量:1
  • 7He Bing - sheng, Fang Wen - bin, Naga K Govindaraju, et al. Mars : a MapReduce framework on graphics processors [ C ] // Proceedings of the 17th International Conference on Parallel Architectures and Compilation Techniques. New York: ACM Press, 2008 : 260 "269. 被引量:1
  • 8Zaharia M, Konwinski A, Joseph A D. Improving MapReduce performance in heterogeneous environments [ C ] //Proceedings of the 8th USENIX Symposium on Operating Systems Design and Implementation. New York: ACM Press, 2008:29 -42. 被引量:1
  • 9Tomwhite.Hadoop权威指南:中文版[M].曾大聃,周傲英,译.北京:清华大学出版社,2010. 被引量:1
  • 10Chu Chen -tao, Kim S K, Lin Yian, et al. Map -Reduce for machine learning on muhicore [ C]//Twentieth Annual Conference on Neural Information Processing Systems, Vancouver: [ s. n. ], 2006 : 281 - 288. 被引量:1

共引文献214

同被引文献108

引证文献7

二级引证文献8

相关作者

内容加载中请稍等...

相关机构

内容加载中请稍等...

相关主题

内容加载中请稍等...

浏览历史

内容加载中请稍等...
;
使用帮助 返回顶部