一、引言随着数据库技术和机器学习技术的发展,在数据库中发现新颖的、具有潜在效用的知识,简称KDD(Knowledge Discovery in Database)是近年来的一个新兴研究领域。KDD中的关联规则是描述数据库中数据项(属性,变量)之间所存在的(...一、引言随着数据库技术和机器学习技术的发展,在数据库中发现新颖的、具有潜在效用的知识,简称KDD(Knowledge Discovery in Database)是近年来的一个新兴研究领域。KDD中的关联规则是描述数据库中数据项(属性,变量)之间所存在的(潜在)关系的规则。我们作如下形式化定义: 令I={i1,i2……,im}为项目集(itemset),D为事务数据库,其中每个事务T是一个项目子集(TI),并具有一个唯一的标识符ID。关联规则是形如XY的逻辑蕴含式,其中XT,YT,且X∩Y=φ。有两个因子与这条规则相关;如果事务数据库中有s%的事务包含X∪Y,那么我们说关联规则XY的支持度(support)为s;如果事务数据库里包含X的事务中有c%的事务同时也包含Y,那么我们说关联规则XY的置信度(confidence)为c。展开更多
Data-mining techniques have been developed to turn data into useful task-oriented knowledge. Most algorithms for mining association rules identify relationships among transactions using binary values and find rules at...Data-mining techniques have been developed to turn data into useful task-oriented knowledge. Most algorithms for mining association rules identify relationships among transactions using binary values and find rules at a single-concept level. Extracting multilevel association rules in transaction databases is most commonly used in data mining. This paper proposes a multilevel fuzzy association rule mining model for extraction of implicit knowledge which stored as quantitative values in transactions. For this reason it uses different support value at each level as well as different membership function for each item. By integrating fuzzy-set concepts, data-mining technologies and multiple-level taxonomy, our method finds fuzzy association rules from transaction data sets. This approach adopts a top-down progressively deepening approach to derive large itemsets and also incorporates fuzzy boundaries instead of sharp boundary intervals. Comparing our method with previous ones in simulation shows that the proposed method maintains higher precision, the mined rules are closer to reality, and it gives ability to mine association rules at different levels based on the user’s tendency as well.展开更多
文摘一、引言随着数据库技术和机器学习技术的发展,在数据库中发现新颖的、具有潜在效用的知识,简称KDD(Knowledge Discovery in Database)是近年来的一个新兴研究领域。KDD中的关联规则是描述数据库中数据项(属性,变量)之间所存在的(潜在)关系的规则。我们作如下形式化定义: 令I={i1,i2……,im}为项目集(itemset),D为事务数据库,其中每个事务T是一个项目子集(TI),并具有一个唯一的标识符ID。关联规则是形如XY的逻辑蕴含式,其中XT,YT,且X∩Y=φ。有两个因子与这条规则相关;如果事务数据库中有s%的事务包含X∪Y,那么我们说关联规则XY的支持度(support)为s;如果事务数据库里包含X的事务中有c%的事务同时也包含Y,那么我们说关联规则XY的置信度(confidence)为c。
文摘Data-mining techniques have been developed to turn data into useful task-oriented knowledge. Most algorithms for mining association rules identify relationships among transactions using binary values and find rules at a single-concept level. Extracting multilevel association rules in transaction databases is most commonly used in data mining. This paper proposes a multilevel fuzzy association rule mining model for extraction of implicit knowledge which stored as quantitative values in transactions. For this reason it uses different support value at each level as well as different membership function for each item. By integrating fuzzy-set concepts, data-mining technologies and multiple-level taxonomy, our method finds fuzzy association rules from transaction data sets. This approach adopts a top-down progressively deepening approach to derive large itemsets and also incorporates fuzzy boundaries instead of sharp boundary intervals. Comparing our method with previous ones in simulation shows that the proposed method maintains higher precision, the mined rules are closer to reality, and it gives ability to mine association rules at different levels based on the user’s tendency as well.
基金The National Natural Science Foundation of China under Grant No.600773169the National Great Project of Scientific and Technical Supporting Programs Funded by Ministry of Science & Technology of China During the 11th Five-year Plan under Grant No.2006BAI05A01~~