摘要
FP growth算法是目前已发表的最有效的频繁模式挖掘算法之一 然而 ,由于在挖掘频繁模式时需要递归地生成大量的条件FP 树 ,其时空效率仍然不够高 改进了FP 树结构 ,提出了一种基于被约束子树挖掘频繁项集的有效算法 改进的FP 树是单向的 ,每个结点只保留指向父结点的指针 ,这大约节省了三分之一的树空间 通过引入被约束子树(可以用 3个很小的数组表示 ) ,算法在挖掘频繁模式时不生成条件FP 树 ,从而大大提高了频繁模式挖掘的时空效率 实验表明 ,与FP growth算法相比 ,算法的挖掘速度提高了 1倍以上 ,而所需的存储空间减少了一半 此外 ,随着数据库规模的增大 ,算法具有很好的可伸缩性 对于稠密数据集 ,算法也具有良好的性能 .
FP-growth algorithm is one of the most efficient frequent pattern mining methods published recently. However, FP-growth algorithm must generate a huge number of conditional FP-trees recursively in process of mining, so the efficiency of FP-growth remains unsatisfactory. In this paper, the structure of a traditional FP-tree is improved and an efficient frequent pattern-mining algorithm based on constrained sub-tree is proposed. The new FP-tree is a one-way tree and there is no pointers to point its children in each node, so at least one third of memory is saved compared with the former structure in the same storage of frequent pattern information. By introducing constrained sub-tree (consisting of three small arrays), the proposed algorithm doesn't generate conditional FP-trees in mining process and therefore greatly improves the mining efficiency in both time and space. Experiments show that in comparison with FP-growth, this algorithm has accelerated the mining speed by at least two times and reduced the space consumption by half. Moreover, the algorithm has a very good time and space scalability with the number of transactions, and has an excellent performance in dense database mining as well.
出处
《计算机研究与发展》
EI
CSCD
北大核心
2003年第8期1216-1222,共7页
Journal of Computer Research and Development
基金
河南省自然科学基金 ( 0 1110 60 70 0 )