摘要
集合枚举树是最大频繁项集挖据算法中常采用的数据结构。在此算法中,最大频繁项集的挖掘过程也可以看作对集合枚举树的搜索过程。为缩小对集合枚举树的搜索空间,本文提出了一种新颖而高效的剪枝方法:根据已挖掘得到的最大频繁模式动态排列枚举树节点的顺序,最大限度的施行剪枝,从而缩小搜索空间。该算法采用位图的数据格式与深度优先的搜索策略。实验结果表明,该算法能有效提高最大频繁项集的挖掘效率,在采用相同的测试数据情况下,效率优于FPMax。
Set Enumeration Tree(SET) is a common data structure in maximal frequent itemsets(MFI) mining algorithm. For this kind of algorithm, the process of mining MFI is a searching process. To reduce the search space of SET,a novel and effective pruning method, MPDR algorithm, in which SET node is max-patterned reordered dynamically, is introduced. Bitmap data format and depth-first search strategy are adopted by this algorithm. Experiments indicate that this algorithm accelerates the generation of maximal frequent itemsets obviously. Using the same testing data sets, our new approach outperforms the FPMax algorithm.
出处
《世界科技研究与发展》
CSCD
2010年第4期440-444,共5页
World Sci-Tech R&D
关键词
数据挖据
最大频繁项集
集合枚举树
动态排序
data mining
maximal frequent itemsets
set enumeration tree
dynamic reordering