摘要
对不平衡数据分类问题进行了研究,提出了两种基于采样的不平衡数据分类方法:一种是采用FarthestFirst聚类降采样,另一种是对样本进行带权重的随机抽样,两种方法均获得了较佳的分类效果。提出了样本带权重随机抽样与分类器集成相结合的不平衡数据分类方法。该方法对训练集的小类样本分别加各种权重,再与大类样本分别合并后进行带权重的随机抽样,生成N份平衡的数据集,分别对基分类器进行训练,最终投票集成组合分类器。实验结果表明,训练集划分与分类器集成相结合的不平衡数据分类方法具有更好的分类效果。
This paper studies the imbalanced data classification problem, and proposes two sampling methods for the imbalanced data classification. One is under-sampling by FarthestFirst clustering; the other is weighted random sampling. Both of them obtain better performance. Then this paper proposes a novel imbalanced data classification method, combining weighted random sampling with ensemble classifiers. In this method, the small samples are set various weights, and merged with large samples into new datasets. With a weighted random sampling for each new dataset, N balanced datasets can be got. These balanced datasets are trained with different classifiers, which will vote for the last result. Experiments show that this method has better classification performance.
出处
《计算机科学与探索》
CSCD
2013年第7期630-638,共9页
Journal of Frontiers of Computer Science and Technology
基金
国家自然科学基金 Nos.61001013
61102136
福建省自然科学基金 Nos.2011J05158
2010J01351
深圳市科技创新基础研究项目No.JCYJ20120618155655087~~
关键词
不平衡分类
预处理
集成学习
imbalanced classification
preprocessing
ensemble learning