摘要
基于FP-tree的最大频繁模式挖掘算法是目前较为高效的频繁模式挖掘算法,针对这些算法需要递归生成条件FP-tree、做超集检验等问题,在分析DMFIA-1算法的基础上,提出了最大频繁模式的非检验挖掘算法NCMFP。该算法改进了FP-tree的结构,使挖掘过程中不需要生成条件频繁模式树也不需要超集检验。算法采用的预测剪枝策略减少了挖掘的次数,采用的求取公共交集的方式保证了挖掘结果的完整性。实验结果表明在支持度相对较小情况下,NCMFP的效率是同类算法的2~5倍。
The algorithms based on FP-tree,for mining maximal frequent patterns,have high performance but with many drawbacks.For example,they must recursively generate conditional FP-trees,have to do the process of superset checking.In order to overcome these drawbacks of the existing algorithms,an algorithm Non-Check Mining algorithm of Maximum Frequent Pattern(NCMFP)for mining maximal frequent patterns was put forward after the analysis of DMFIA-1 algorithm.In the algorithm,neither constructing conditional frequent pattern tree recursively nor superset checking was needed through modifying the structure of FP-tree.This algorithm reduced the number of mining through early prediction before mining.The application of a method to get the public intersection sets could obtain a complete result.The experiment shows that the efficiency of NCMFP is two to five times as much as that of the similar algorithms in the case of a relatively small support.
出处
《计算机应用》
CSCD
北大核心
2010年第7期1922-1925,共4页
journal of Computer Applications
基金
江苏省自然科学基金资助项目(BK20003017)
关键词
关联规则
数据挖掘
频繁模式树
最大频繁项集
超集检验
association rule
data mining
Frequent Pattern Tree(FP-tree)
maximum frequent itemsets
superset checking