摘要
基于频繁模式的分类应用研究尚处于初始阶段,但其在关系数据、文本文档与图等方面的分类应用已取得初步成果。系统地研究了基于信息增益区分的频繁模式分类问题,提出了一种基于信息增益区分的频繁模式分类模型(IGFPC),从理论上论证了该模型的可行性。通过建立模式频率与基于信息增益区分度量间的联系,提出了一种在挖掘有用频繁模式上设置最小支持度阀值的方法,基于该方法和提出的特征选择算法(IGPS),生成用以构建高质量模式分类器的区分频繁模式。实验研究显示基于信息增益区分的频繁模式分类框架模型能在分类大数据集上达到较好的扩展性能和较高的分类精度。
The apphcation of frequent patterns in classification appeared in sporadic studies and achieved initial success in the classification of relational data,text documents and graphs.This paper,conducts a systematic exploration of information gain based frequent pattern classification,and provides solid reasons supporting this methodology.By building a connection between pattern frequency and discriminative measures such as information gain,and also develops a strategy to set minimum support in frequent pattern mining for generating useful patterns.Based on this strategy,coupled with a proposed feature selection algorithm,discriminative frequent patterns can be generated for building high quality classifiers.The paper demonstrates that the information gain based frequent pattern classification framework can achieve good scalability and high accuracy in classifying large datasets.
出处
《计算机工程与应用》
CSCD
北大核心
2009年第7期159-163,共5页
Computer Engineering and Applications
基金
国家自然科学基金No.NSFC-60273094
宁波市自然科学基金No.2006A610012~~
关键词
信息增益
频繁模式
分类
区分方法
information gain
frequent pattern
classification
discriminative measure