信息增益区分频繁模式分类方法

Frequent pattern classification method based on information gain

下载PDF

导出

摘要基于频繁模式的分类应用研究尚处于初始阶段,但其在关系数据、文本文档与图等方面的分类应用已取得初步成果。系统地研究了基于信息增益区分的频繁模式分类问题,提出了一种基于信息增益区分的频繁模式分类模型(IGFPC),从理论上论证了该模型的可行性。通过建立模式频率与基于信息增益区分度量间的联系,提出了一种在挖掘有用频繁模式上设置最小支持度阀值的方法,基于该方法和提出的特征选择算法(IGPS),生成用以构建高质量模式分类器的区分频繁模式。实验研究显示基于信息增益区分的频繁模式分类框架模型能在分类大数据集上达到较好的扩展性能和较高的分类精度。 The apphcation of frequent patterns in classification appeared in sporadic studies and achieved initial success in the classification of relational data,text documents and graphs.This paper,conducts a systematic exploration of information gain based frequent pattern classification,and provides solid reasons supporting this methodology.By building a connection between pattern frequency and discriminative measures such as information gain,and also develops a strategy to set minimum support in frequent pattern mining for generating useful patterns.Based on this strategy,coupled with a proposed feature selection algorithm,discriminative frequent patterns can be generated for building high quality classifiers.The paper demonstrates that the information gain based frequent pattern classification framework can achieve good scalability and high accuracy in classifying large datasets.

作者陶剑文赵杰煜姚奇富

机构地区浙江工商职业技术学院信息工程系宁波大学信息科学与工程学院

出处《计算机工程与应用》 CSCD 北大核心 2009年第7期159-163,共5页 Computer Engineering and Applications

基金国家自然科学基金No.NSFC-60273094 宁波市自然科学基金No.2006A610012~~

关键词信息增益频繁模式分类区分方法 information gain frequent pattern classification discriminative measure

分类号 TP311 [自动化与计算机技术—计算机软件与理论]

引文网络
相关文献

参考文献27

1Agrawal R,Srikant R.Fast algorithms for mining association rules[C]// Proc of VLDB,1994:487-499. 被引量：1
2Han J,Pei J,Yin Y.Mining frequent patterns without candidate generation[C]//Proc of SIGMOD, 2000:1 - 12. 被引量：1
3Zaki M J,Hsiao C.CHARM:An efficient algorithm for closed itemset mining[C]//Proc of SDM,2002:457-473. 被引量：1
4Agrawal R,Srikant R.Mining sequential patterns[C]//Proc of ICDE, 1995:3-14. 被引量：1
5Pei J,Han J,Mortazavi-Asl B,et al.PrefixSpan:Mining sequential patterns efficiently by prefix-projected pattern growth[C]//Proc of ICDE,2001 : 215-226. 被引量：1
6Zaki M J.SPADE:An efficient algorithm for mining frequent sequences[J].Machine Learning, 2001, g2(1/2) : 31-60. 被引量：1
7Kuramochi M,Karypis G.Frequent subgraph discovery[C]//Proc of ICDM, 2001 : 313-320. 被引量：1
8Yan X,Han J.gSpan:graph-based substructure pattern mining[C]// Proc of ICDM,2002:721-724. 被引量：1
9Agrawal R,Imielinski T.Mining association rules between sets of items in large databases[C]//Proc of SIGMOD,1993:207-216. 被引量：1
10Yan X,Yu P S,Han J.Graph indexing:a frequent structure-based approach[C]//Proc of SIGMOD,200g:335-346. 被引量：1

1郑才松,季铎,蔡东风.基于系统融合的专家同名区分方法[J].沈阳航空航天大学学报,2014,31(2):74-78. 被引量：2
2周海英,化春键,方程骏.基于机器视觉的梨表面缺陷检测方法研究[J].计算机与数字工程,2013,41(9):1492-1494. 被引量：5
3袁江,乔佩利.基于模式比较的漏洞分析技术研究[J].信息技术,2008,32(3):17-20.
4王中贤,钱颂迪.揭示数据的内在规律[J].管理科学,1998,14(4):31-32.
5胡萍,胡德斌.一种利用ms补丁信息分析windows软件漏洞的方法[J].黑龙江科技信息,2008(30):80-80.
6张新征,雷鹏飞,李玉坤,车向东.面向论文检索的同名作者区分方法[J].计算机与数字工程,2017,45(2):216-220. 被引量：1
7杨永杰,隋会静,包志华.一种快速动静态业务的区分方法[J].光通信研究,2010(3):26-28.
8于淼,陈杰,窦丽华,甘明刚.Positioning accuracy of IGPS[J].Journal of Harbin Institute of Technology(New Series),2010,17(2):219-224.
9Metris iGPS大型测量系统[J].航空制造技术,2009,52(6):104-105. 被引量：1
10叶进,王建新,袁银行,张向利.一种丢包区分方法及其在Linux下的实现[J].计算机工程,2010,36(16):77-78. 被引量：1

计算机工程与应用

2009年第7期

浏览历史

内容加载中请稍等...

信息增益区分频繁模式分类方法

参考文献27

相关作者

相关机构

相关主题

浏览历史