摘要
Apriori算法是关联规则挖掘的经典算法。但在面对海量数据集时,由于过多的磁盘读写操作,使算法的效率大大降低。本文提出了一种将数据集根据处理计算机的内存进行分割,使分割后的每一部分数据集能直接放入内存中,用Apriori算法找到局部的关联规则,再在所有找到的局部关联的基础上,使用遗传算法寻找全局关联规则的方法。由于大大减少了磁盘操作,在处理海量数据集时本方法效率优于传统Apriori算法。
Apriori algorithm is the classic algorithm in the mining of associate rule.But faced at large dataset,because of too much disk I/O,the performance of Apriori algorithm drops dramatically.This paper proposes an approach that divides the database to put every parts into RAM directly,then finds the local associate rules according to Apriori algorithm,finally finds the overall associate rules based on all local associate rules using general genetic algorithm(GGA).Because of the much decrease of disk I/O,the performance of this approach is better than the traditional Apriori algorithm when using large dataset.
出处
《计算机与现代化》
2004年第11期1-3,6,共4页
Computer and Modernization
基金
国家教育部博士点基金项目(98061117)
重庆市应用基础研究项目(7369)。