期刊文献+

不确定数据流最大频繁项集挖掘算法研究 被引量:9

Mining maximum frequent itemsets over uncertain data streams
下载PDF
导出
摘要 对于大型数据,频繁项集挖掘显得庞大而冗余,挖掘最大频繁项集可以减少挖出的频繁项集的个数。可是对于不确定性数据流,传统判断项集是否频繁的方法已不能准确表达项集的频繁性,而且目前还没有在不确定数据流上挖掘最大频繁项集的相关研究。因此,针对上述不足,提出了一种基于衰减模型的不确定性数据流最大频繁项集挖掘算法TUFSMax。该算法采用标记树结点的方法,使得算法不需要超集检测就可挖掘出所有的最大频繁项集,节约了超集检测时间。实验证明了提出的算法在时间和空间上具有高效性。 For large data bases, the number of frequent itemsets is huge and redundancy, and mining maximum frequentitemsets is more suitable because the scale of the output is much smaller. But traditional mining maximum frequent itemsetsalgorithm assumes the availability of precise data. Mining frequent itemsets from uncertain data streams is muchmore complicated than precise streams, and there is no research on mining maximum frequent itemsets over uncertaindata streams until now. Therefore, aiming at the shortcoming, the paper proposes a maximum frequent itemsets miningalgorithm TUFSMax. The algorithm adopts a decay window model to find frequent itemsets through calculating expectedsupports, and it uses a new method, called marking the tree nodes. By using the new method, algorithm TUFSMax canavoid super detection in the course of mining all of the maximum frequent itemsets, to save the detection time. Experimentalresults show that the proposed algorithm is efficient in time and space.
作者 刘慧婷 候明利 赵鹏 姚晟 LIU Huiting;HOU Mingli;ZHAO Peng;YAO Sheng(School of Computer Science and Technology, Anhui University, Hefei 230601, China)
出处 《计算机工程与应用》 CSCD 北大核心 2016年第19期72-77,93,共7页 Computer Engineering and Applications
基金 国家自然科学基金(No.61202227) 安徽省自然科学基金(No.1408085MF122)
关键词 不确定性数据流 最大频繁项集 超集检测 uncertain data stream maximum frequent items super check
  • 相关文献

参考文献16

  • 1Tong Yongxin,Chen Lei,Yu P S.UFIMT:An uncertainfrequent itemset mining toolbox[C].Proceedings of the18th ACM SIGKDD International Conference onKnowledge Discovery and Data Mining(KDD),2012:1508-1511. 被引量:1
  • 2张常品,刘广钟.不确定性数据频繁项集挖掘算法[J].计算机系统应用,2014,23(11):160-164. 被引量:2
  • 3Tong Yongxin,Chen Lei,Ding Bolin.Discovering thresholdbasedfrequent closed itemsets over probabilistic data[C].Proceedings of the IEEE 28th International Conferenceon Data Engineering(ICDE),2012:270-281. 被引量:1
  • 4Gao Feng,Wu Chengrong.Mining frequent itemset fromuncertain data[C].Proceedings of the International Conferenceon Electrical and Control Engineering(ICECE),2011:2329-2333. 被引量:1
  • 5Leung C K S,MacKinnon R K,Tanbeer S K.Fast algorithmsfor frequent itemset mining from uncertain data[C].Proceedings of the IEEE International Conference onData Mining(ICDM),2014:893-898. 被引量:1
  • 6He Yanshan,Yue Min.Parallel frequent itemset miningon streaming data[C].Proceedings of the 10th IEEE InternationalConference on Natural Computation(ICNC),2014:725-730. 被引量:1
  • 7廖国琼,吴凌琴,万常选.基于概率衰减窗口模型的不确定数据流频繁模式挖掘[J].计算机研究与发展,2012,49(5):1105-1115. 被引量:15
  • 8Roberto J,Bayardo J.Efficiently mining long patterns fromdatabases[C].Proceedings of the 1998 ACM SIGMODInternational Conference on Management of Data,1998:85-93. 被引量:1
  • 9宋余庆,朱玉全,孙志挥,陈耿.基于FP-Tree的最大频繁项目集挖掘及更新算法[J].软件学报,2003,14(9):1586-1592. 被引量:164
  • 10Grahne G,Zhu J F.High performance mining of maximalfrequent itemsets[C].Proceedings of the 6th SIAMInternational Workshop on High Performance,2003:135-143. 被引量:1

二级参考文献59

  • 1杨君锐,赵群礼.一种不产生候选集的最大频繁集快速挖掘算法[J].微电子学与计算机,2004,21(11):125-128. 被引量:4
  • 2秦亮曦,李谦,史忠植.基于排序FP-树的频繁模式高效挖掘算法[J].计算机科学,2005,32(4):31-33. 被引量:13
  • 3易月娥,林亚平,王永红.基于FP-tree挖掘密集型数据最大频繁模式算法[J].湖南城市学院学报(自然科学版),2007,16(1):76-78. 被引量:2
  • 4李建中 于戈 周傲英.不确定性数据管理的要求与挑战[J].中国计算机学会通讯,2009,5(4):6-14. 被引量:8
  • 5宋余庆 朱玉全 孙志辉 陈耿.基于FP—Tree的最大频繁项集挖掘及其更新算法.软件学报,2003,14(9):1586—1592[J].http://wwwjos.org.cn/1000-9825/14/1586.htm,:. 被引量:1
  • 6Agrawal R, Srikant R. Fast algorithms for mining association rules. In: Proc. of the 20th Int'l Conf. on VLDB. 1994. 487-499.http://www.almaden.ibm.conVcs/people/srikant/papers/vldb94.pdf. 被引量:1
  • 7Bayardo R. Efficiently mining long patterns from databases. In: Haas LM, ed. Proc. of the ACM SIGMOD Int'l Conf. on Management of Data. New York: ACM Press, 1998. 85-93. 被引量:1
  • 8Burdick D, Calimlim M, Gehrke J. Mafia: A maximal frequent itemset algorithm for transactional databases. In: Proc. of the 17th Int'l Conf. on Data Engineering. 2001. 443-452. http://www.cs.cornell.edu/boom/2001 sp/yiu/mafia-camera.pdf. 被引量:1
  • 9Gouda K, Zaki MJ. Efficiently mining maximal frequent itemsets. In: Proc. of the 1st IEEE Int'l Conf. on Data Mining. 2001.163-170. http ://www.cs .tau. ac .il/-fiat/dmsem03/E fficient%20Mining%20Maxmal%20Frequent%20Itemsets%20-%202001 .pdf. 被引量:1
  • 10Wang H, Li QH. An improved maximal frequent itemset algorithm. In: Wang GY, eds. Proc of the Rough Sets, Fuzzy Sets, Data Mining, and Granular Computing, the 9th Int'l Conf (RSFDGrC 2003). LNCS 2639, Heidelberg: Springer-Verlag, 2003. 484-490. 被引量:1

共引文献231

同被引文献46

引证文献9

二级引证文献22

相关作者

内容加载中请稍等...

相关机构

内容加载中请稍等...

相关主题

内容加载中请稍等...

浏览历史

内容加载中请稍等...
;
使用帮助 返回顶部