一种基于闭项集的无冗余关联规则挖掘方法被引量：2

Mining Non-Redundant Association Rules Based on Closed Itemsets

下载PDF

导出

摘要针对关联规则挖掘中存在的规则数量过多,难于理解和应用的问题,提出了一种基于闭项集的无冗余关联规则挖掘算法.首先,给出了无冗余关联规则的定义,并基于规则信任度的概念说明了该定义的合理性;其次,在生成子、闭项集和无冗余关联规则的基础上,给出了无冗余最小-最大精确规则基和无冗余最小-最大近似规则基的定义,并讨论了它们的剪枝策略.最后,讨论了生成子的性质及连接策略,并在包含索引的基础上,给出了一种宽度优先的无冗余关联规则挖掘算法.实验结果表明,本文提出的算法不仅可以发现规模较小的无冗余关联规则,提高了挖掘结果的可理解性,而且具有较高的挖掘效率. Association rule mining often produces several tens of thousands of association rules, which causes the problem of understanding and applying the mining results. To solve this problem, an algorithm for mining non-redundant association rules based on closed itemset is proposed. Firstly, the concept of non-redundant association rule based on closed itemset is proposed, and the rationality of the concept is explained based on conviction. Then, based on generator, closed itemset and non-redundant association rule, the definitions of non-redundant min-max precise rule basis and non-redundant minmax approximate rule basis are proposed, and the corresponding pruning strategies are discussed. Finally, the characteristics and connection strategies of generator are presented, and based on subsume index, a breadth-first algorithm for mining non-redundant association rule is proposed. Experimental results show that the non-redundant rules with smaller sizes can be discovered. Thus, the understandability of mining result is improved. Furthermore, the proposed algorithm is also efficient.

作者宋威高磊李晋宏

机构地区北方工业大学信息工程学院

出处《北京交通大学学报》 CAS CSCD 北大核心 2009年第6期91-96,共6页 JOURNAL OF BEIJING JIAOTONG UNIVERSITY

基金北京市市属高等学校人才强教计划项目北方工业大学青年重点研究基金项目资助北方工业大学博士科研启动基金项目资助

关键词数据挖掘无冗余关联规则生成子闭项集包含索引 data mining non-redundant association rule generator closed itemset subsume index

分类号 TP311 [自动化与计算机技术—计算机软件与理论]

引文网络
相关文献

参考文献12

1Ceglar A, Roddick J F. Association Mining[J]. ACM Computing Surveys, 2006,38 (2) : 1 - 42. 被引量：1
2Lee Y C, Hong T P, Lin W Y. Mining Association Rules with Multiple Minimum Supports Using Maximum Constraints[J]. International Journal of Approximate Reasoning, 2005,40(1 - 2) : 44 - 54. 被引量：1
3Yang L. Pruning and Visualizing Generalized Association Rules in Parallel Coordinates [J]. IEEE Transaction on Knowledge and Data Engineering, 2005,17(1) :60- 70. 被引量：1
4阮备军,朱扬勇.基于商品分类信息的关联规则聚类[J].计算机研究与发展,2004,41(2):352-360. 被引量：17
5陈晓云,胡运发.一种基于兴趣度的大型数据库关联规则挖掘方法[J].模式识别与人工智能,2003,16(4):494-499. 被引量：4
6Pasquier N, Bastide Y, Taouil R, et al. Discovering Frequent Closed Itemsets for Association Rules[ C]//Proceedings of the 7th International Conference Database Theory. London: Springer, 1999: 398-416. 被引量：1
7Brin S, Motwani R, Ullman J D, et al. Dynamic Itemset Counting and Implication Rules for Market Basket Data[ C] J/Proceedings of 1997 ACM SIGMOD International Conference on Management of Data. Tucson: USA, ACM, 1997:255 - 264. 被引量：1
8Song W, Yang B R, Xu Z Y. Index-Close Miner: An Improved Algorithm for Mining Frequent Closed hemset[J]. Intelligent Data Analysis, 2008,12(4) : 321 - 338. 被引量：1
9Song W, Yang B R, Xu Z Y. Index-MaxMiner: A New Maximal Frequent Itemset Mining Algorithm[J]. International Journal on Artificial Intelligence Tools, 2005, 17 (2) :303 - 320. 被引量：1
10Jorge A, Azevedo P J. An Experiment with Association Rules and Classification: Post-Bagging and Conviction[C] /// Proceedings of the 8th International Conference on Discover Science. Singapore: Springer, 2005: 137- 149. 被引量：1

二级参考文献23

1E G Hetzler, W M Harris, S Harvre et al. Visualizing the full spectrum of document relationships. In: Proc of the 5th Int'l Society for Knowledge Organization Conference. Würzburg: Ergon, 1998. 168～175 被引量：1
2P C Wong, P Whitney, J Thomas. Visualizing association rules for text mining. In: Proc of IEEE Symposium on Information Visualization(INFOVIS'99). San Francisco: IEEE Computer Society, 1999. 120～123 被引量：1
3M Hao, M Hsu, U Dayal et al. Market basket analysis visualization on a spherical surface. HP Labs, Technical Report: HPL-2001-3, 2001 被引量：1
4H Toivonen, M Klemettinen, P Ronkainen et al. Pruning and grouping discovered association rules. The ECML-95 Workshop on Statistics, Machine Learning, and Knowledge Discovery in Databases, Heraklion, 1995 被引量：1
5G K Gupta, A Strehl, J Ghosh. Distance based clustering of association rules. In: Proc of ANNIE, St. Louis, Missouri: ASME Press, 1999. 759～764 被引量：1
6M Ankerst, M Breunig, H P Kriegel et al. OPTICS: Ordering points to identify the clustering structure. In: Proc of 1999 ACM-SIGMOD Int'l Conf Management of Data (SIGMOD'99). Philadephia: ACM Press, 1999. 49～60 被引量：1
7J Han, Y Fu. Discovery of multiple level association rules from large databases. In: Proc of the 21st Int'l Conf on Very Large Databases(VLDB'95). Zurich: Morgan Kaufmann, 1995. 420～431 被引量：1
8R Srikant, R Agrawal. Mining generalized association rules.In:Proc of the 21st Int'l Conf on Very Large Databases(VLDB'95). Zurich: Morgan Kaufmann, 1995. 407～419 被引量：1
9A Savasere, E Omiecinski, S Navathe. Mining for strong negative associations in a large database of customer transactions. In: Proc of the 14th Int'l Conf on Data Engineering. Orlando: IEEE Computer Society, 494～502 被引量：1
10B Lent, A N Swami, J Widom. Clustering association rules. In: Proc of the 13th Int'l Conf on Data Engineering. Birmingham: IEEE Computer Society, 1997. 220～231 被引量：1

共引文献19

1彭小娟,郑冬花.寿险事务数据库挖掘关联规则的分析和设计[J].科技资讯,2008,6(16):232-233.
2蔡红,陈荣耀,陈波.关联规则挖掘最小支持度阀值设定的优化算法研究[J].微型电脑应用,2011(6):33-36. 被引量：9
3窦祥国,胡学钢.关联规则的评价方法研究[J].安徽技术师范学院学报,2005,19(4):44-47. 被引量：5
4韦素云,吉根林,曲维光.关联规则的冗余删除与聚类[J].小型微型计算机系统,2006,27(1):110-113. 被引量：15
5梁敏,阮备军,朱扬勇.基于分类信息的关联规则间距离函数的改进[J].计算机应用与软件,2006,23(3):7-9. 被引量：1
6鲁增秋,陈玉哲,王殿升.一种改进的基于商品分类信息的多层关联规则挖掘算法[J].科技情报开发与经济,2006,16(14):137-139. 被引量：3
7赵永尊,张谧,赵卫东,李银胜.基于品类聚类的关联规则优化算法[J].计算机应用与软件,2007,24(1):140-142. 被引量：1
8张玉芳,杨柯,熊忠阳.基于关联规则的中文文本分类算法的改进[J].郑州大学学报（理学版）,2007,39(2):114-117. 被引量：6
9谌志群,张国煊.文本挖掘与中文文本挖掘模型研究[J].情报科学,2007,25(7):1046-1051. 被引量：50
10娄会东,苏瑞,金建军.基于模式与规则寻找的数据挖掘研究[J].河南理工大学学报（自然科学版）,2007,26(4):467-471. 被引量：1

同被引文献12

1徐章艳,刘美玲,张师超,卢景丽,区玉明.Apriori算法的三种优化方法[J].计算机工程与应用,2004,40(36):190-192. 被引量：71
2黄端琼,陈崇成,黄洪宇,樊明辉.基于映射位集合的遥感图像关联规则挖掘[J].计算机应用,2005,25(7):1592-1594. 被引量：2
3朱玉全,宋余庆,陈耿.关联规则挖掘中增量式更新算法的研究[J].计算机工程与应用,2005,41(15):186-187. 被引量：8
4曾万聃,周绪波,戴勃,常桂然,李春平.关联规则挖掘的矩阵算法[J].计算机工程,2006,32(2):45-47. 被引量：33
5宋宝莉,覃征.分布式全局频繁项目集的快速挖掘方法[J].西安交通大学学报,2006,40(8):923-927. 被引量：11
6Tan Pang-Ning,Steinbach M,Kumar V.数据挖掘导论[M].范明,范宏建译.北京:人民邮电出版社,2006. 被引量：30
7朱晓燕,宋擒豹.基于排序的关联分类算法[J].计算机科学,2009,36(7):204-207. 被引量：6
8孙英慧,孙英娟.关联规则挖掘Apriori算法研究[J].吉林师范大学学报（自然科学版）,2009,30(4):82-84. 被引量：4
9李珺,刘鹤,朱良宽.基于Apriori关联规则算法的草莓叶片含水状况研究[J].北方园艺,2020(19):146-151. 被引量：1
10李珺,刘鹤,朱良宽.基于改进的K-means算法的关联规则数据挖掘研究[J].小型微型计算机系统,2021,42(1):15-19. 被引量：33

引证文献2

1王明,宋顺林.基于项集优化重组的频繁项集发现算法[J].计算机应用,2010,30(9):2332-2334. 被引量：2
2万鑫,李梓如,李裕梅.关联规则分析中兴趣度量Lift与Conviction的关系探讨及教育数据验证[J].数据挖掘,2024,14(3):189-206.

二级引证文献2

1孙洁,沈桂兰.基于关联规则及知识网络的专业课程关联分析[J].中国电力教育（下）,2014(7):55-57. 被引量：3
2陈方云,杜孝平.网吧上网人员伴随分析与算法改进研究[J].计算机光盘软件与应用,2014,17(15):105-107.

1曾致中.对于基于最长频繁闭项集的聚类算法的探讨[J].农业网络信息,2007(6):60-60.
2张泽洪,张伟.基于最长频繁闭项集的聚类算法[J].计算机工程,2007,33(1):187-189. 被引量：2
3翟悦,秦放.基于概念格的无冗余关联规则提取算法[J].计算机应用与软件,2015,32(4):46-49. 被引量：4
4杨越越,董祥军,翟延富.冗余关联规则删减技术研究综述[J].山东轻工业学院学报（自然科学版）,2007,21(4):31-33.
5李晋宏,杨炳儒,宋威,侯伟.基于包含索引的频繁闭序列模式挖掘的新算法[J].系统工程与电子技术,2009,31(10):2485-2488. 被引量：1
6王慧,王京.FP-tree上频繁概念格的无冗余关联规则提取[J].计算机工程与应用,2012,48(15):12-15. 被引量：12
7周金陵,刁兴春,曹建军.基于开项集剪枝的常量条件函数依赖挖掘[J].清华大学学报（自然科学版）,2016,56(3):253-261. 被引量：1
8李新良.大数据环境下的基于超图的去除冗余关联规则算法研究[J].职教与经济研究,2015,13(2):59-62.
9宋威,李晋宏,徐章艳,杨炳儒.一种新的频繁项集精简表示方法及其挖掘算法的研究[J].计算机研究与发展,2010,47(2):277-285. 被引量：18
10赵明旺.一种具有自学习能力的神经网络规则基控制方法[J].信息与控制,1994,23(6):372-375. 被引量：1

北京交通大学学报

2009年第6期

浏览历史

内容加载中请稍等...

一种基于闭项集的无冗余关联规则挖掘方法被引量：2

参考文献12

二级参考文献23

共引文献19

同被引文献12

引证文献2

二级引证文献2

相关作者

相关机构

相关主题

浏览历史

一种基于闭项集的无冗余关联规则挖掘方法 被引量：2

参考文献12

二级参考文献23

共引文献19

同被引文献12

引证文献2

二级引证文献2

相关作者

相关机构

相关主题

浏览历史

一种基于闭项集的无冗余关联规则挖掘方法被引量：2