摘要
针对关联规则挖掘中存在的规则数量过多,难于理解和应用的问题,提出了一种基于闭项集的无冗余关联规则挖掘算法.首先,给出了无冗余关联规则的定义,并基于规则信任度的概念说明了该定义的合理性;其次,在生成子、闭项集和无冗余关联规则的基础上,给出了无冗余最小-最大精确规则基和无冗余最小-最大近似规则基的定义,并讨论了它们的剪枝策略.最后,讨论了生成子的性质及连接策略,并在包含索引的基础上,给出了一种宽度优先的无冗余关联规则挖掘算法.实验结果表明,本文提出的算法不仅可以发现规模较小的无冗余关联规则,提高了挖掘结果的可理解性,而且具有较高的挖掘效率.
Association rule mining often produces several tens of thousands of association rules, which causes the problem of understanding and applying the mining results. To solve this problem, an algorithm for mining non-redundant association rules based on closed itemset is proposed. Firstly, the concept of non-redundant association rule based on closed itemset is proposed, and the rationality of the concept is explained based on conviction. Then, based on generator, closed itemset and non-redundant association rule, the definitions of non-redundant min-max precise rule basis and non-redundant minmax approximate rule basis are proposed, and the corresponding pruning strategies are discussed. Finally, the characteristics and connection strategies of generator are presented, and based on subsume index, a breadth-first algorithm for mining non-redundant association rule is proposed. Experimental results show that the non-redundant rules with smaller sizes can be discovered. Thus, the understandability of mining result is improved. Furthermore, the proposed algorithm is also efficient.
出处
《北京交通大学学报》
CAS
CSCD
北大核心
2009年第6期91-96,共6页
JOURNAL OF BEIJING JIAOTONG UNIVERSITY
基金
北京市市属高等学校人才强教计划项目
北方工业大学青年重点研究基金项目资助
北方工业大学博士科研启动基金项目资助
关键词
数据挖掘
无冗余关联规则
生成子
闭项集
包含索引
data mining
non-redundant association rule
generator
closed itemset
subsume index