摘要
针对传统频繁项集挖掘算法效率低下的问题,提出基于Hadoop平台的并行BMR-FIUT算法。通过引入FIU-Tree(frequent items ultrametric tree)结构挖掘频繁项集,避免传统算法的缺陷;改进FIUT算法的分解过程,使之适应于Map-Reduce框架下的并行计算,达到并行化的目的;利用并行熵作为集群系统的负载均衡度量,使系统尽可能在各节点间合理分发数据以平衡负载。实验结果表明,BMR-FIUT算法能够有效减少并行化过程中节点负载倾斜的问题,较现有的PFP-Growth算法具有更好的性能,适用于海量数据挖掘。
Focusing on the inefficient problem of traditional algorithms for mining frequent itemsets, a parallel algorithm named Balanced _ MapReduce _ FIUT (BMR-FIUT) based on Hadoop platform was proposed. By introducing frequent items ultrametric tree (FIU-Tree) structure, frequent itemsets were mined, effectively avoiding the defects of the traditional algorithm. The process of decomposition was improved with FIUT algorithm to adapt to its parallel computing under the framework of MapReduce, achieving the purpose of parallelization. The parallel entropy was used as the load balance measurement in cluster system, so that system could in all reasonable to distribute data as much as possible between every nodes. Experimental results show that BMR-FIUT algorithm can effectively reduce the problem about load inclination of any node in the process of parallelization, it is superior to the existing PFP-Growth algorithm and it has better performance on mining volume big data.
作者
晏依
徐苏
YAN Yi;XU Su(School of Information Engineering,Nanchang University,Nanchang 330031,China)
出处
《计算机工程与设计》
北大核心
2019年第3期685-690,787,共7页
Computer Engineering and Design