摘要
随着大数据时代的到来,增量关联规则挖掘已成为数据挖掘领域的热门话题。CAN-tree作为增量关联规则挖掘领域的重要算法,其按项目频次大小进行排序会使树(tree)的规模过大,降低算法效率。针对此问题,提出一种基于AP-CAN的增量关联挖掘算法,采用AP聚类思想将原始数据集按项目的支持度不同分为多个集群,修剪不满足最小支持度的集群,利用哈希头表替代项头表,并根据数据量对每条事务排序。实验结果表明,该方法可以显著削减CAN树的规模,降低项目查找时间,提高数据挖掘效率,在效率和稳定性上均优于现有的CAN-tree算法。
With the advent of the era of big data,incremental association rule mining has become a hot topic in the field of data mining.CAN-tree is an important algorithm in the field of incremental association rule mining,while sorting by item frequency will make the tree scale too large and the algorithm efficiency low.To solve this problem,an incremental association mining algorithm based on AP-CAN is proposed.The algorithm adopts the idea of AP clustering to divide the original data set into multiple clusters according to the different support degree of the project,pruning the clusters that do not meet the minimum support degree,replacing the item head table with the hash head table,and sorting each transaction according to the data volume.Experimental results show that this method CAN significantly reduce the scale of CAN-tree,reduce the search time of items,improve the efficiency of data mining,and is better than the existing CAN-tree algorithm in efficiency and stability.
作者
洪炎
张磊
严加琪
HONG Yan;ZHANG Lei;YAN Jiaqi(College of Electrical and Information Engineering,Anhui University of Science and Technology,Huainan 232001,China)
出处
《安庆师范大学学报(自然科学版)》
2021年第2期20-25,共6页
Journal of Anqing Normal University(Natural Science Edition)
基金
国家自然科学基金青年科学基金项目(61501006)
安徽省自然科学基金面上基金(1808085MF169)
安徽高校自然科学研究项目(KJ2018A0086)。