期刊文献+

信息增益区分频繁模式分类方法

Frequent pattern classification method based on information gain
下载PDF
导出
摘要 基于频繁模式的分类应用研究尚处于初始阶段,但其在关系数据、文本文档与图等方面的分类应用已取得初步成果。系统地研究了基于信息增益区分的频繁模式分类问题,提出了一种基于信息增益区分的频繁模式分类模型(IGFPC),从理论上论证了该模型的可行性。通过建立模式频率与基于信息增益区分度量间的联系,提出了一种在挖掘有用频繁模式上设置最小支持度阀值的方法,基于该方法和提出的特征选择算法(IGPS),生成用以构建高质量模式分类器的区分频繁模式。实验研究显示基于信息增益区分的频繁模式分类框架模型能在分类大数据集上达到较好的扩展性能和较高的分类精度。 The apphcation of frequent patterns in classification appeared in sporadic studies and achieved initial success in the classification of relational data,text documents and graphs.This paper,conducts a systematic exploration of information gain based frequent pattern classification,and provides solid reasons supporting this methodology.By building a connection between pattern frequency and discriminative measures such as information gain,and also develops a strategy to set minimum support in frequent pattern mining for generating useful patterns.Based on this strategy,coupled with a proposed feature selection algorithm,discriminative frequent patterns can be generated for building high quality classifiers.The paper demonstrates that the information gain based frequent pattern classification framework can achieve good scalability and high accuracy in classifying large datasets.
出处 《计算机工程与应用》 CSCD 北大核心 2009年第7期159-163,共5页 Computer Engineering and Applications
基金 国家自然科学基金No.NSFC-60273094 宁波市自然科学基金No.2006A610012~~
关键词 信息增益 频繁模式 分类 区分方法 information gain frequent pattern classification discriminative measure
  • 相关文献

参考文献27

  • 1Agrawal R,Srikant R.Fast algorithms for mining association rules[C]// Proc of VLDB,1994:487-499. 被引量:1
  • 2Han J,Pei J,Yin Y.Mining frequent patterns without candidate generation[C]//Proc of SIGMOD, 2000:1 - 12. 被引量:1
  • 3Zaki M J,Hsiao C.CHARM:An efficient algorithm for closed itemset mining[C]//Proc of SDM,2002:457-473. 被引量:1
  • 4Agrawal R,Srikant R.Mining sequential patterns[C]//Proc of ICDE, 1995:3-14. 被引量:1
  • 5Pei J,Han J,Mortazavi-Asl B,et al.PrefixSpan:Mining sequential patterns efficiently by prefix-projected pattern growth[C]//Proc of ICDE,2001 : 215-226. 被引量:1
  • 6Zaki M J.SPADE:An efficient algorithm for mining frequent sequences[J].Machine Learning, 2001, g2(1/2) : 31-60. 被引量:1
  • 7Kuramochi M,Karypis G.Frequent subgraph discovery[C]//Proc of ICDM, 2001 : 313-320. 被引量:1
  • 8Yan X,Han J.gSpan:graph-based substructure pattern mining[C]// Proc of ICDM,2002:721-724. 被引量:1
  • 9Agrawal R,Imielinski T.Mining association rules between sets of items in large databases[C]//Proc of SIGMOD,1993:207-216. 被引量:1
  • 10Yan X,Yu P S,Han J.Graph indexing:a frequent structure-based approach[C]//Proc of SIGMOD,200g:335-346. 被引量:1

相关作者

内容加载中请稍等...

相关机构

内容加载中请稍等...

相关主题

内容加载中请稍等...

浏览历史

内容加载中请稍等...
;
使用帮助 返回顶部