期刊文献+

少数类的集成学习 被引量:1

Ensemble Learning Method to Classify Minority Class
下载PDF
导出
摘要 传统机器学习中研究的分类问题通常假定各类别是平衡的,但在很多场合各类别的出现概率相差很大,而且很多应用中需要区分重要而稀少的少数类。本文比较了3种基于AdaBoost集成学习方法,并推导出他们的精度几何平均(GMA)的下界。分析表明:类别越不平衡,这3种方法越难以通过提高基分类器准确率来提高GMA。在此结论的基础上,以Bagging为基础提出了单边Bagging算法,该算法只对多数类抽样,而保留所有少数类,因而每轮的训练集是类别平衡的,并通过UC I数据集验证了其有效性。 Assuming that the classes are well-balanced, there exists many domains. One class is represented by many examples while the other is represented by only a few, thus, in many applications it is necessary to classify important and rare classes. The lower bounds on geometric mean accuracy(GMA) for three AdaBoost based ensemble methods are presented. The analysis shows that if the more "imbalanced" classes are used, it is more difficult to increase GMA by improving the accuracy of base classifiers. A Bagging based ensemble method, called the single side Bagging(SSBagging) is proposed and the algorithm retains all minority examples and bootstraps majority examples from pool of training set to create "bags" of the example. Experiments with UCI datasets show the validity of SSBagging.
出处 《南京航空航天大学学报》 EI CAS CSCD 北大核心 2009年第4期520-526,共7页 Journal of Nanjing University of Aeronautics & Astronautics
基金 国家自然科学基金(60603029)资助项目
关键词 集成学习 不平衡类别 单边Bagging ensemble learning imbalaneed class single side Bagging
  • 相关文献

参考文献2

二级参考文献5

  • 1[1]Freund, Y., Schapire, R. A decision-theoretic generalization of on-line learning and an application to boosting. Journal of Computer and System Sciences, 1997,55(1):119~139. 被引量:1
  • 2[2]Breiman, L., Friedman, J., Olshen, R., et al. Classification and Regression Trees. Belmont, CA: Wadsworth, 1984. 1~357. 被引量:1
  • 3[3]Schapire, R., Singer, Y. BoosTexter: a boosting-based system for text categorization. Machine Learning, 2000,39(2/3):135~168. 被引量:1
  • 4[4]Salton, G., Wong, A., Yang, C. A vector space model for automatic indexing. Communications of the ACM, 1995,18:613~620. 被引量:1
  • 5[5]Schapire, R., Singer, Y. Improved boosting algorithms using confidence-related predictions. Machine Learning, 1999,37(3): 297~336. 被引量:1

共引文献14

同被引文献12

  • 1Han Jiawei,Kamber M.数据挖掘概念与技术[M].范明,孟小峰,译.2版.北京:机械工业出版社,2007. 被引量:5
  • 2Gary M, Weiss.Mining S1GKDD Explorations Japkowicz N, Stephen atic study[J].Intelligent with rarity: a unifying framework[J].ACM Newsletter, 2004,6 ( 1 ) : 7-19. 被引量:1
  • 3S.The class imbalance problem:a systemData Analysis,2002,6(5) :429-449. 被引量:1
  • 4Zhu Jingbo,Hovy E.Active learning for word sense disambiguation with methods for addressing the class imbalance problem[C]// Proceedings of Joint Conference on Empirical Methods in Natural Language Processing and Computational,Prague,2007:783-790. 被引量:1
  • 5WittenIH,FrankE.数据挖掘-实用机器学习技术[M].董琳,邱泉,译.2版.北京:机械工业出版社,2007:208.215. 被引量:1
  • 6Chawla N V,Bowyer K W.SMOTE:synthetie minority oversampling technique[J].Journal of Artificial Intelligence Research, 2002(16) :341-378. 被引量:1
  • 7Fan W, Stolfo S J, Junxin Z, et al.AdaCost: misclassification cost-sensitive Boosting[C]//Bratko I.Proceedings of the 6th Inter Conferon Machine Learning (ICMLC).[S.I.]: Morgan Kaufmann, 1999:97-105. 被引量:1
  • 8Peng Yuxin, Yao Jia.AdaOUBoost: adaptive over-sampling and uader-sampling to boost the concept learning in large scale irabalanced data sets[C]//Proceedings of the International Conference on Multimedia Information Retrieval,Philadelphia,Pennsylvania,USA,2010:111-118. 被引量:1
  • 9Joshi M V,Kumar V,Agarwal R C.Evaluating boosting algorithms to classify rare classes: comparison and improvements[C]// Cercone N,Lin T Y, Wu X.Proceedings of the 2001 IEEE Inter Conf on Data Mining(ICDMc) 2001.[S.I.]:IEEE Computer Society Press,2001,12:257-264. 被引量:1
  • 10Blake C, Keogh E, Merz C J.UCI repository of machine learning databases[EB/OL]. ( 1998 ).http ://www.ics.uci.edu/~mlearn/MLRepository.html. 被引量:1

引证文献1

二级引证文献3

相关作者

内容加载中请稍等...

相关机构

内容加载中请稍等...

相关主题

内容加载中请稍等...

浏览历史

内容加载中请稍等...
;
使用帮助 返回顶部