期刊文献+

集成降采样不平衡数据分类方法研究 被引量:3

Research on Imbalanced Data Classification Based on Ensemble and Under-Sampling
下载PDF
导出
摘要 对不平衡数据分类问题进行了研究,提出了两种基于采样的不平衡数据分类方法:一种是采用FarthestFirst聚类降采样,另一种是对样本进行带权重的随机抽样,两种方法均获得了较佳的分类效果。提出了样本带权重随机抽样与分类器集成相结合的不平衡数据分类方法。该方法对训练集的小类样本分别加各种权重,再与大类样本分别合并后进行带权重的随机抽样,生成N份平衡的数据集,分别对基分类器进行训练,最终投票集成组合分类器。实验结果表明,训练集划分与分类器集成相结合的不平衡数据分类方法具有更好的分类效果。 This paper studies the imbalanced data classification problem, and proposes two sampling methods for the imbalanced data classification. One is under-sampling by FarthestFirst clustering; the other is weighted random sampling. Both of them obtain better performance. Then this paper proposes a novel imbalanced data classification method, combining weighted random sampling with ensemble classifiers. In this method, the small samples are set various weights, and merged with large samples into new datasets. With a weighted random sampling for each new dataset, N balanced datasets can be got. These balanced datasets are trained with different classifiers, which will vote for the last result. Experiments show that this method has better classification performance.
出处 《计算机科学与探索》 CSCD 2013年第7期630-638,共9页 Journal of Frontiers of Computer Science and Technology
基金 国家自然科学基金 Nos.61001013 61102136 福建省自然科学基金 Nos.2011J05158 2010J01351 深圳市科技创新基础研究项目No.JCYJ20120618155655087~~
关键词 不平衡分类 预处理 集成学习 imbalanced classification preprocessing ensemble learning
  • 相关文献

参考文献4

二级参考文献22

共引文献142

同被引文献46

  • 1林舒杨,李翠华,江弋,林琛,邹权.不平衡数据的降采样方法研究[J].计算机研究与发展,2011,48(S3):47-53. 被引量:31
  • 2徐章艳,刘作鹏,杨炳儒,宋威.一个复杂度为max(O(|C||U|),O(|C^2|U/C|))的快速属性约简算法[J].计算机学报,2006,29(3):391-399. 被引量:234
  • 3CATENI S, COLLA V, VANNUCCI M. A method for resampling imbalanced datasets in binary classification tasks for real-world problems[J]. Neurocomputing, 2014, 135: 32-41. 被引量:1
  • 4ZHANG Huaxiang, LI Mingfang. RWO-Sampling: a random walk over-sampling approach to imbalanced data classification[J]. Information fusion, 2014, 20: 99-116. 被引量:1
  • 5CHAWLA N V, BOWYER K W, HALL L O, et al. SMOTE: synthetic minority over-sampling technique[J]. Journal of artificial intelligence research, 2002, 16(1): 321-357. 被引量:1
  • 6CHEN Xiaolin, SONG Enming, MA Guangzhi. An adaptive cost-sensitive classifier[C]//Proceedings of the 2nd International Conference on Computer and Automation Engineering. Singapore: IEEE, 2010, 1: 699-701. 被引量:1
  • 7WANG Shijin, XI Lifeng. Condition monitoring system design with one-class and imbalanced-data classifier[C]//Proceedings of the 16th International Conference on Industrial Engineering and Engineering Management. Beijing, China: IEEE, 2009: 779-783. 被引量:1
  • 8HAN Hui, WANG Wenyuan, MAO Binghuan. Borderline-SMOTE: a new over-sampling method in imbalanced data sets learning[C]//International Conference on Intelligent Computing. Berlin Heidelberg, Germany: Springer, 2005: 878-887. 被引量:1
  • 9HE Haibo, BAI Yang, GARCIA E A, et al. ADASYN: adaptive synthetic sampling approach for imbalanced learning[C]//Proceedings of IEEE International Joint Conference on Neural Networks. Hong Kong, China: IEEE, 2008: 1322-1328. 被引量:1
  • 10BATISTA G, PRATI R C, MONARD M C. A study of the behavior of several methods for balancing machine learning training data[J]. ACM SIGKDD explorations newsletter, 2004, 6(1): 20-29. 被引量:1

引证文献3

二级引证文献15

相关作者

内容加载中请稍等...

相关机构

内容加载中请稍等...

相关主题

内容加载中请稍等...

浏览历史

内容加载中请稍等...
;
使用帮助 返回顶部