摘要
关联规则挖掘时,数据集中各项目的重要性不同且较难主观给出,直接影响挖掘结果.针对此问题,给出加权项目集和加权关联规则的概念,并通过信息熵来确定单属性的权重,同时采用几何均值和取最大权重值的折中方法来确定多项目集的权重,以此在兼顾整体权重的同时,突出重要项目.在此基础上,采用加权频繁模式树来提取加权频繁模式,并给出加权频繁模式树的构造方法,最后以国家天文台提供的天体光谱数据及机械装备EDEM数据作为数据集,实验验证算法的高效率.
In association rule mining, the importance of items is different and can not be subjectively given, which affects the mining result. The weighted items and weighted association rules are given, in which the weights of single attribute are determined by information entropy and the weights of items are determined by the compromise method between geometric mean and maximum weight value. Thus, the important projects are highlighted and the overall weights are balanced at the same time. On the basis of all above factors, weighted frequent patterns are extracted by using weighted frequent pattern tree, and the structure method of weighted frequent pattern tree is given. Finally, the experimental results on the spectral data of celestial body and the mechanical equipment EDEM verify the high efficiency of the proposed algorithm.
出处
《模式识别与人工智能》
EI
CSCD
北大核心
2014年第1期28-34,共7页
Pattern Recognition and Artificial Intelligence
基金
国家自然科学基金项目(No.41140027)
山西省青年基金项目(No.2012021015-4)
山西省高校科技创新项目(No.20121011)资助
关键词
关联规则
信息熵
频繁模式
Association Rule, Information Entropy, Frequent Pattern