关联规则挖掘是数据挖掘的重要研究课题。大数据处理对关联规则挖掘算法效率提出了更高要求,而关联规则挖掘最耗时的步骤是频繁模式挖掘。针对当前频繁模式挖掘算法效率不高的问题,结合Apriori和FPgrowth算法,提出一种基于事务映射区间...关联规则挖掘是数据挖掘的重要研究课题。大数据处理对关联规则挖掘算法效率提出了更高要求,而关联规则挖掘最耗时的步骤是频繁模式挖掘。针对当前频繁模式挖掘算法效率不高的问题,结合Apriori和FPgrowth算法,提出一种基于事务映射区间求交的频繁模式挖掘算法(interval interaction and transaction mapping,IITM)。只需扫描数据集两次来生成FP树,然后扫描FP树将每个项的ID映射到区间中,通过区间求交来进行模式增长。该算法解决了Apriori算法需要多次扫描数据集、FP-growth算法需要迭代地生成条件FP树来进行模式增长而带来的效率下降的问题。在真实数据集上的实验显示,在不同的支持度下IITM算法都要优于Apriori、FP-growth以及PIETM算法。展开更多
In the case of unknown weights, theories of multi-attributed decision making based on interval numbers and grey related analysis were used to optimize mining methods. As the representative of independence for the indi...In the case of unknown weights, theories of multi-attributed decision making based on interval numbers and grey related analysis were used to optimize mining methods. As the representative of independence for the indicator, the smaller the correlation of indicators is, the greater the weight is. Hence, the weights of interval numbers of indicators were determined by using correlation coefficient. Relative closeness based on positive and negative ideal methods was calculated by introducing distance between interval numbers, which made decision making more rational and comprehensive. A new method of ranking interval numbers based on normal distribution was proposed for the optimization of mining methods, whose basic properties were discussed. Finally, the feasibility and effectiveness of this method were verified by theories and practice.展开更多
文摘关联规则挖掘是数据挖掘的重要研究课题。大数据处理对关联规则挖掘算法效率提出了更高要求,而关联规则挖掘最耗时的步骤是频繁模式挖掘。针对当前频繁模式挖掘算法效率不高的问题,结合Apriori和FPgrowth算法,提出一种基于事务映射区间求交的频繁模式挖掘算法(interval interaction and transaction mapping,IITM)。只需扫描数据集两次来生成FP树,然后扫描FP树将每个项的ID映射到区间中,通过区间求交来进行模式增长。该算法解决了Apriori算法需要多次扫描数据集、FP-growth算法需要迭代地生成条件FP树来进行模式增长而带来的效率下降的问题。在真实数据集上的实验显示,在不同的支持度下IITM算法都要优于Apriori、FP-growth以及PIETM算法。
基金Project(50774095) supported by the National Natural Science Foundation of ChinaProject(200449) supported by the National Outstanding Doctoral Dissertations Special Funds of China
文摘In the case of unknown weights, theories of multi-attributed decision making based on interval numbers and grey related analysis were used to optimize mining methods. As the representative of independence for the indicator, the smaller the correlation of indicators is, the greater the weight is. Hence, the weights of interval numbers of indicators were determined by using correlation coefficient. Relative closeness based on positive and negative ideal methods was calculated by introducing distance between interval numbers, which made decision making more rational and comprehensive. A new method of ranking interval numbers based on normal distribution was proposed for the optimization of mining methods, whose basic properties were discussed. Finally, the feasibility and effectiveness of this method were verified by theories and practice.