摘要
在对关联规则冗余问题产生机理分析的基础上,提出了针对于支持度阀值设置的惩罚函数和一个改进的遗传算法。该改进算法采用了频繁项分布、素因子编码、择偶和共享函数等新颖技术,使染色体总是能在频繁项密集区进行挖掘,从而对组合搜索空间进行了有效修剪。并且对事务进行了数值转换,有效地压缩了事务数据库存储空间,提高了运算速度。从实验效果来看,改进的挖掘方法在发现有价值规则的效率与精准率方面具有一定优势。
This paper proposes the penalty function by setting support threshold and an improved genetic algorithm, based on the mechanism analysis of redundancy problem production. The algorithm makes chromosome always mining in the concentrated area of frequent item by using some new technologies such as frequent item distribution, primes factor coding, spouse and sharing function, and thus combination space is validly pruned. Moreover, because the numerical conversion is used for the transaction, the storage space of transaction database is validly compressed and the operation speed is improved. Experiment results show that the improved mining method of the paper has certain advantage on the efficiency and precision of finding the valuable rules.
出处
《计算机工程与应用》
CSCD
北大核心
2008年第14期155-158,165,共5页
Computer Engineering and Applications
关键词
关联规则
遗传算法
频繁项分布
素因子编码
择偶
association rules
genetic algorithm
frequent item distribution
primes factor coding
spouse