摘要
针对实时数据流的完全频繁项集挖掘算法计算成本较高的问题,提出一种基于改进FPTree的高效实时数据流完全频繁项集挖掘算法。使用改进的FPTree兼容地表示滑动窗口中的所有事务,建立一个完整的基树;利用事务的字母顺序简单地实现基树的插入与删除操作,无需对基树进行重组操作;利用分组Tree结构对基树进行由上而下的遍历来建立项目树,以较低的计算成本发现完全的频繁项集。仿真结果表明,该方案可有效地发现实时数据流的频繁项集,获得较低的计算成本。
Concerning the problem of high computational cost of complete frequent itemsets mining algorithm of real-time data stream,an improved FPTree based complete frequent itemset mining algorithm of real-time data stream was proposed.Improved FPTree was adopted to represent all transactions in the sliding window compactly,and a complete base tree was constructed.The alphabetical order of transactions was used to realize the insert and delete operations of base tree easily without any reconstruct operation for base tree.Group tree structure was used to construct the project-tree by a top-down tree traverse,and the complete frequent itemsets were discovered with low computational cost.Results of simulation show that the proposed algorithm can discover the frequent itemsets of the real-time data stream efficiently with lower computational cost.
出处
《计算机工程与设计》
北大核心
2017年第10期2759-2766,共8页
Computer Engineering and Design
基金
河南省科技厅软科学研究计划基金项目(152400410345)
河南省教育厅基金项目(15A520093)
关键词
关联规则挖掘
频繁项集
实时数据流
字母顺序
项目树遍历
数据挖掘
association rule mining
frequent itemsets
real-time data stream
alphabetical o rd e r
project tree traverse
data mining