期刊文献+

基于矩阵的数据流Top-k频繁项集挖掘算法 被引量:3

Top-k Frequent Itemsets Mining Algorithm over Data Streams Based on Matrix
下载PDF
导出
摘要 传统的数据挖掘算法在挖掘频繁项集时会产生大量的冗余项集,影响挖掘效率。为此,提出一种基于矩阵的数据流Top-k频繁项集挖掘算法。引入2个0-1矩阵,即事务矩阵和二项集矩阵。采用事务矩阵表示滑动窗口模型中的事务列表,通过计算每行的支持度得到二项集矩阵。利用二项集矩阵得到候选项集,将事务矩阵中对应的行做逻辑与运算,计算出候选项集的支持度,从而得到Top-k频繁项集。把挖掘的结果存入数据字典中,当用户查询时,能够按支持度降序输出Top-k频繁项集。实验结果表明,该算法在挖掘过程中能避免冗余项集的产生,在保证正确率的前提下具有较高的时间效率。 The past algorithms produce large amounts of redundant itemsets, and they affect the efficiency of data mining. Therefore, a Top-k frequent itemsets mining algorithm over data streams based on matrix is proposed. Two 0-1 matrices, transaction matrix and 2-itemsets matrix, are introduced into the algorithm. Using transaction matrix to express the transaction list of a sliding window, and 2-itemsets matrix is obtained by calculating the support of each row. Then it can get candidate items by 2-itemsets matrix, and Top-k frequent itemsets are obtained by calculating the support of candidate items through logic and operation of correspond row in transaction matrix. Finally it saves the result of data mining into data dictionary. The algorithm can output the Top-k frequent itemsets by support in descendant order when user queries. Experimental results show that the algorithm avoids redundant itemsets in the process of data mining, and the efficiency of data mining is improved appreciably under the premise of accuracy.
出处 《计算机工程》 CAS CSCD 2014年第3期55-58,75,共5页 Computer Engineering
关键词 数据挖掘 数据流 滑动窗口 矩阵 Top-k频繁项集 data mining data stream sliding window matrix Top-k frequent itemset
  • 相关文献

参考文献4

二级参考文献70

  • 1牛小飞,石冰,卢军,吴科.挖掘关联规则的高效ABM算法[J].计算机工程,2004,30(11):118-120. 被引量:16
  • 2金澈清,钱卫宁,周傲英.流数据分析与管理综述[J].软件学报,2004,15(8):1172-1181. 被引量:161
  • 3BABCOCK B,BABU S,DATAR M, et al. Models and issues in data stream systems [ C ]//Proc of the 21 st ACM SIGMOD-SIGART Sympo- sium on Principles of Database System. New York:ACM Press,2002: 1-16. 被引量:1
  • 4GAROFALAKIS M, GEHRKE J. Querying and mining data streams: you only get one look a tutorial[ C]//Proc of ACM SIGMOD Interna- tional Conference on Management of Data. New York: ACM Press, 2002:635. 被引量:1
  • 5LEE D, LEE W. Finding maximal frequent itemsets over online data streams adaptively [ C ]//Proc of the 5th IEEE International Confe- rence on Daia Mining. Washington DC : IEEE Computer Society,2005 : 266 - 273. 被引量:1
  • 6LI Hua-fu, LEE S, SHAN M. Online mining maximal frequent itemsets over data streams[ C]//Proc of the 15th International Workshops on Research Issues in Data Engineering: Stream Data Mining and Appli- cations. 2005 : 11 - 18. 被引量:1
  • 7MAO Guo-jun, WU Xin-dong, ZHU Xing-quan, et al. Mining maximal frequent itemsets from data streams[ J]. Journal of Information Sci- ence,2007,33(3 ) :251-262. 被引量:1
  • 8GIANNELLA C, HAN Jia-wei, PEI Jian, et al. Mining frequent pat- terns in data streams at multiple time granularities [ M ]//Next Gene- ration Data Mining. Cambridge : MIT Press ,2005 : 191 - 212. 被引量:1
  • 9BORGELT C. Keeping things simple:finding frequent itemsets by re- cursive elimination [ C ]//Proc of the 1 st International Workshop on Open Source Data Mining. New York :ACM Press,2005:66-70. 被引量:1
  • 10AGRAWAL R, SRIKANT R. Fast algorithms for mining association rules[ C]//Proc of the 20th International Conference on Very Large Databases. San Francisco: Morgan Kaufmann Publishers, 1994:487- 499. 被引量:1

共引文献45

同被引文献28

引证文献3

二级引证文献12

相关作者

内容加载中请稍等...

相关机构

内容加载中请稍等...

相关主题

内容加载中请稍等...

浏览历史

内容加载中请稍等...
;
使用帮助 返回顶部