期刊文献+

大数据环境下基于前缀树的频繁项集挖掘 被引量:1

Frequent Itemset Mining Using Prefix Tree in Big Data Environment
下载PDF
导出
摘要 针对大数据环境下频繁项查找效率低和可扩展性问题,提出了一种基于MapReduce框架运行的新分布式FIM算法。首先,使用前缀序列树来构建候选序列子集,避免了昂贵的扫描过程。接着,使用宽幅支持度的方法产生频繁项集,每个MapReduce迭代将修剪掉非频繁项集,显著地压缩内存消耗,以及每一个MapReduce作业的迭代时间。最后,在不同事务规模和支持度下,与不同算法进行实验对比。实验结果表明,提出的序列增长算法获得了良好的效率和可扩展性,特别是在处理大数据集和长项集方面。 For the problems of low efficiency and scalability in frequent itemset mining, a new distributed FIM algorithm is proposed, and implements it on MapReduce framework. Firstly, the algorithm applies the idea of prefix sequence to construct a tree, by which all frequent itemsets can be found without exhaustive search over the transaction databases. Then, it produces frequent itemsets in a breadth-wide support-based approach. In each Map Reduce iteration, the infrequent itemsets will be pruned away. It significantly deducts memory consumption and iteration time of each MapReduce job. Finally, the experimental comparison with different algorithms is performed under different scales of business and support degree. The results show the good efficiency and scalability of sequence-growth especially for dealing with big data and long itemsets.
作者 黄彩娟 刘卓华 所辉 杨滨 HUANG Cai-juan;LIU Zhuo-hua;SUO Hui;YANG Bin(School of Computer and Design,Guangdong Mechanical&Electrical Polytechnic,Guangzhou 510515,China;School of Design,Jiangnan University,Wuxi 214122,China)
出处 《控制工程》 CSCD 北大核心 2019年第11期2136-2140,共5页 Control Engineering of China
基金 广东省高等学校优秀青年教师培养计划资助项目(Yq2013171)
关键词 频繁项集挖掘 MAPREDUCE 前缀序列树 模糊支持度 大数据 Frequent itemset mining MapReduce prefix sequence tree fuzzy support big data
  • 相关文献

参考文献7

二级参考文献62

  • 1邹翔,张巍,刘洋,蔡庆生.分布式序列模式发现算法的研究[J].软件学报,2005,16(7):1262-1269. 被引量:19
  • 2AGRAWAL R, IMIELINSKI T, SWAMI A. Mining As- sociation Rules between Sets of Items in Large Data Bases [ C ]// Proc of the 1993 ACM-SIGMOD International Conference on Management of Data ( SIGMOD ' 93 ). Washington, DC :ACM, 1993:207-216. 被引量:1
  • 3AGRAWAL R, SRIKANT R. Fast Algorithms for Mining Association Rules [ C ]// Proc of the 1994 International Conferenee on Very Large Data Bases ( VLDB' 94). San- tiago, Chile: Conference Publieations, 1994:487499. 被引量:1
  • 4HAN J, PEI J, YIN Y. Mining Frequent Patterns without Candidate Generation[ C]//Proc of 2000 ACM-SIGMOD International Conference on Management of Data (SIG- MOD' 00 ). Dallas :Conference Publications, 2000 : 1-12. 被引量:1
  • 5LI L, ZHANG Y. Optimization of Frequent hemset Min- ing on Muhiple-core Proeessor [ C ]// Proe. of the 33^rd International Conference on Very Large Data Bases. Vien- na, Austria: VLDB Endowmen, 2007 : 1275-1285. 被引量:1
  • 6LAMINE M, NHIEN L, TAHAR M. Distributed frequent itemsets mining in heterogeneous platforms [ J ]. Journal of Engineering, Computing and Architecture, 2007:1 (2) :1-12. 被引量:1
  • 7MOHAMMAD E, OSMAR R. ParalLel Leap: Large-Scale Maximal Pattern Mining in a Distributed Environment [ C ]//Proc of the 12th International Conference on Paral- lel and Distributed Systems. Minneapolis MN: Confer- ence Publications, 2006: 135-142. 被引量:1
  • 8MUHAIMENUL A, REDA A. A Bounded and Adaptive Memory-Based Approach to Mine Frequent Patterns from Very Large Databases [ J ]. IEEE Transactions on Sys- tems, man, and cybernetics-part B : cybernetics, 2011,41 ( 1 ) :154-172. 被引量:1
  • 9JEFFREY D, SAN JAY G. MapReduce: Simplified Data Processing on Large Clusters[ J ]. Communications of the ACM, 2008, 51(1) :107-113. 被引量:1
  • 10HAN J W,MICHELINEK.数据挖掘:概念与技术[M].范明,孟小峰,译.北京:机械工业出版社,2004:146-183. 被引量:1

共引文献58

同被引文献9

引证文献1

二级引证文献4

相关作者

内容加载中请稍等...

相关机构

内容加载中请稍等...

相关主题

内容加载中请稍等...

浏览历史

内容加载中请稍等...
;
使用帮助 返回顶部