摘要
传统的数据挖掘算法在挖掘频繁项集时会产生大量的冗余项集,影响挖掘效率。为此,提出一种基于矩阵的数据流Top-k频繁项集挖掘算法。引入2个0-1矩阵,即事务矩阵和二项集矩阵。采用事务矩阵表示滑动窗口模型中的事务列表,通过计算每行的支持度得到二项集矩阵。利用二项集矩阵得到候选项集,将事务矩阵中对应的行做逻辑与运算,计算出候选项集的支持度,从而得到Top-k频繁项集。把挖掘的结果存入数据字典中,当用户查询时,能够按支持度降序输出Top-k频繁项集。实验结果表明,该算法在挖掘过程中能避免冗余项集的产生,在保证正确率的前提下具有较高的时间效率。
The past algorithms produce large amounts of redundant itemsets, and they affect the efficiency of data mining. Therefore, a Top-k frequent itemsets mining algorithm over data streams based on matrix is proposed. Two 0-1 matrices, transaction matrix and 2-itemsets matrix, are introduced into the algorithm. Using transaction matrix to express the transaction list of a sliding window, and 2-itemsets matrix is obtained by calculating the support of each row. Then it can get candidate items by 2-itemsets matrix, and Top-k frequent itemsets are obtained by calculating the support of candidate items through logic and operation of correspond row in transaction matrix. Finally it saves the result of data mining into data dictionary. The algorithm can output the Top-k frequent itemsets by support in descendant order when user queries. Experimental results show that the algorithm avoids redundant itemsets in the process of data mining, and the efficiency of data mining is improved appreciably under the premise of accuracy.
出处
《计算机工程》
CAS
CSCD
2014年第3期55-58,75,共5页
Computer Engineering