摘要
FP-Growth算法在关联规则挖掘中是最经典的算法,主要通过频繁模式树(FP树)避免生成候选频繁项目集。针对FP-Growth算法中耗费内存严重的问题,采用链表存储方式,给出了FP-Growth算法的实现方法,其中单个结点采用链表形式来产生,频繁模式树采用左孩子右兄弟的存储结构来组织。在此基础上利用索引表,实现了对频繁模式树中共同前缀结点的快速查找,提高了频繁模式树构造的效率,解决了FP树构造算法中数据存储的瓶颈问题。最后以天体光谱数据和城市土壤数据作为数据集分别对该算法进行测试,实验结果表明,该方法的构造效率要明显优于基于顺序结构的FP-Growth算法。
FP-Growth algorithm is the most typical and the most commonly used construction algorithms of frequent pattern tree(FP-Tree).This article presents a realization method of FP-Growth algorithm based on linked list structure,in which the node of tree is organized by linked list structure,and the frequent pattern tree is stored by using children-brother data structure.On this basis,the quick search for the common and former nodes in the frequent pattern tree by using indexed table becomes true.Accordingly,the time efficiency of the construction of frequent pattern tree is improved,and the bottleneck of data store in the FP-tree construction algorithm is solved.Finally,through the use of the stellar data and urban soil data as the experimental data,the experiment results show that the method is evidently prior to the construction efficiency of the FP-Growth algorithm based on the sequential structure.
出处
《太原科技大学学报》
2013年第2期85-90,共6页
Journal of Taiyuan University of Science and Technology
基金
山西省青年基金(2012021015-4)
山西省高校高新技术产业化项目(20121011)
太原科技大学校青年基金(20123020)
关键词
关联规则
频繁模式
链表结构
索引表
光谱数据
association rules
frequent pattern
linked list structure
indexed table
spectra data