少数类的集成学习被引量：1

Ensemble Learning Method to Classify Minority Class

下载PDF

导出

摘要传统机器学习中研究的分类问题通常假定各类别是平衡的,但在很多场合各类别的出现概率相差很大,而且很多应用中需要区分重要而稀少的少数类。本文比较了3种基于AdaBoost集成学习方法,并推导出他们的精度几何平均(GMA)的下界。分析表明:类别越不平衡,这3种方法越难以通过提高基分类器准确率来提高GMA。在此结论的基础上,以Bagging为基础提出了单边Bagging算法,该算法只对多数类抽样,而保留所有少数类,因而每轮的训练集是类别平衡的,并通过UC I数据集验证了其有效性。 Assuming that the classes are well-balanced, there exists many domains. One class is represented by many examples while the other is represented by only a few, thus, in many applications it is necessary to classify important and rare classes. The lower bounds on geometric mean accuracy（GMA） for three AdaBoost based ensemble methods are presented. The analysis shows that if the more ＂imbalanced＂ classes are used, it is more difficult to increase GMA by improving the accuracy of base classifiers. A Bagging based ensemble method, called the single side Bagging（SSBagging） is proposed and the algorithm retains all minority examples and bootstraps majority examples from pool of training set to create ＂bags＂ of the example. Experiments with UCI datasets show the validity of SSBagging.

作者潘志松燕继坤

机构地区解放军理工大学指挥自动化学院西南电子电信技术研究所现代信号处理国家重点实验室

出处《南京航空航天大学学报》 EI CAS CSCD 北大核心 2009年第4期520-526,共7页 Journal of Nanjing University of Aeronautics & Astronautics

基金国家自然科学基金(60603029)资助项目

关键词集成学习不平衡类别单边Bagging ensemble learning imbalaneed class single side Bagging

分类号 TP391.14 [自动化与计算机技术—计算机应用技术]

引文网络
相关文献

参考文献2

1刁力力,胡可云,陆玉昌,石纯一.用Boosting方法组合增强Stumps进行文本分类(英文)[J].软件学报,2002,13(8):1361-1367. 被引量：15
2Robert E. Schapire,Yoram Singer. Improved Boosting Algorithms Using Confidence-rated Predictions[J] 1999,Machine Learning(3):297～336 被引量：1

二级参考文献5

1[1]Freund, Y., Schapire, R. A decision-theoretic generalization of on-line learning and an application to boosting. Journal of Computer and System Sciences, 1997,55(1):119～139. 被引量：1
2[2]Breiman, L., Friedman, J., Olshen, R., et al. Classification and Regression Trees. Belmont, CA: Wadsworth, 1984. 1～357. 被引量：1
3[3]Schapire, R., Singer, Y. BoosTexter: a boosting-based system for text categorization. Machine Learning, 2000,39(2/3):135～168. 被引量：1
4[4]Salton, G., Wong, A., Yang, C. A vector space model for automatic indexing. Communications of the ACM, 1995,18:613～620. 被引量：1
5[5]Schapire, R., Singer, Y. Improved boosting algorithms using confidence-related predictions. Machine Learning, 1999,37(3): 297～336. 被引量：1

共引文献14

1董乐红,耿国华,高原.Boosting算法综述[J].计算机应用与软件,2006,23(8):27-29. 被引量：26
2姜远,周志华.基于词频分类器集成的文本分类方法[J].计算机研究与发展,2006,43(10):1681-1687. 被引量：22
3王志玲,王效岳.国内文本分类研究论文的统计分析[J].图书情报工作,2006,50(11):136-138. 被引量：2
4谭建龙,张吉,郭莉.基于通用后缀树模型的垃圾邮件过滤方法[J].计算机工程,2007,33(9):100-102.
5李文斌,刘椿年,钟宁.基于两阶段集成学习的分类器集成[J].北京工业大学学报,2010,36(3):410-419. 被引量：4
6杨国田,吴章宪,杨鹏远.Boosting在火灾识别中的应用研究[J].计算机工程与应用,2010,46(5):200-204. 被引量：3
7武振宇,贾慧珣,朱骥.Boosting算法对卵巢癌代谢组数据的应用研究[J].中国卫生统计,2012,29(6):786-789. 被引量：4
8谭爱平,成亚玲.基于支持向量机的网络入侵检测集成学习算法[J].湖南工业职业技术学院学报,2013,13(2):3-7. 被引量：1
9谭爱平,陈浩,吴伯桥.基于SVM的网络入侵检测集成学习算法[J].计算机科学,2014,41(2):197-200. 被引量：34
10胡莲.AdaBoost集成SVM的供应链金融信用风险评估[J].征信,2014,32(11):19-22. 被引量：11

同被引文献12

1Han Jiawei,Kamber M.数据挖掘概念与技术[M].范明,孟小峰,译.2版.北京:机械工业出版社,2007. 被引量：5
2Gary M, Weiss.Mining S1GKDD Explorations Japkowicz N, Stephen atic study[J].Intelligent with rarity: a unifying framework[J].ACM Newsletter, 2004,6 ( 1 ) : 7-19. 被引量：1
3S.The class imbalance problem:a systemData Analysis,2002,6(5) :429-449. 被引量：1
4Zhu Jingbo,Hovy E.Active learning for word sense disambiguation with methods for addressing the class imbalance problem[C]// Proceedings of Joint Conference on Empirical Methods in Natural Language Processing and Computational,Prague,2007:783-790. 被引量：1
5WittenIH,FrankE.数据挖掘-实用机器学习技术[M].董琳,邱泉,译.2版.北京:机械工业出版社,2007:208.215. 被引量：1
6Chawla N V,Bowyer K W.SMOTE:synthetie minority oversampling technique[J].Journal of Artificial Intelligence Research, 2002(16) :341-378. 被引量：1
7Fan W, Stolfo S J, Junxin Z, et al.AdaCost: misclassification cost-sensitive Boosting[C]//Bratko I.Proceedings of the 6th Inter Conferon Machine Learning (ICMLC).[S.I.]: Morgan Kaufmann, 1999:97-105. 被引量：1
8Peng Yuxin, Yao Jia.AdaOUBoost: adaptive over-sampling and uader-sampling to boost the concept learning in large scale irabalanced data sets[C]//Proceedings of the International Conference on Multimedia Information Retrieval,Philadelphia,Pennsylvania,USA,2010:111-118. 被引量：1
9Joshi M V,Kumar V,Agarwal R C.Evaluating boosting algorithms to classify rare classes: comparison and improvements[C]// Cercone N,Lin T Y, Wu X.Proceedings of the 2001 IEEE Inter Conf on Data Mining(ICDMc) 2001.[S.I.]:IEEE Computer Society Press,2001,12:257-264. 被引量：1
10Blake C, Keogh E, Merz C J.UCI repository of machine learning databases[EB/OL]. ( 1998 ).http ://www.ics.uci.edu/~mlearn/MLRepository.html. 被引量：1

引证文献1

1王灿伟,于治楼,张化祥.一种适合不平衡数据集的新型提升算法[J].计算机工程与应用,2011,47(28):169-172. 被引量：3

二级引证文献3

1林坚,郭剑辉,卲晴薇,张敏怡.结合SMOTE和GEPSVM的不平衡数据分类方法[J].信息技术,2017,41(1):5-8. 被引量：1
2闫建红.不平衡数据度量指标优化的提升分类方法[J].计算机工程与应用,2018,54(21):128-132. 被引量：2
3姚丽娟,李冬冬,王喆.基于Relief特征选择的心衰死亡率预测[J].计算机工程与应用,2018,54(23):125-130. 被引量：4

1缪志敏,赵陆文,潘志松,胡谷雨.一种基于聚类分布的支持向量数据描述[J].兰州大学学报（自然科学版）,2008,44(S1):239-244.
2缪志敏,胡谷雨,丁力,赵陆文,潘志松.SVDD在类别不平衡学习中的应用[J].应用科学学报,2008,26(1):79-84. 被引量：5
3党宏社,张超,庞毅,侯金良.基于ORB算法的象棋快速识别和定位系统研究[J].科学技术与工程,2017,17(7):52-57. 被引量：6
4赵宁波,王彦林.基于几何平均的非线性细分算法[J].大众商务（下半月）,2009(9):171-171.
5刘小平,徐桂云,任世锦,杨茂云.一种新的不平衡数据v-NSVDD多分类算法[J].南京大学学报（自然科学版）,2013,49(2):150-158. 被引量：3
6陈美霞,郭躬德,黄杰,刘永芬.针对不平衡数据集的入侵检测算法[J].福建师范大学学报（自然科学版）,2010,26(4):37-43.
7毛艳慧,韩崇昭.非线性系统中目标跟踪性能评估的新度量[J].自动化学报,2014,40(11):2650-2653. 被引量：2
8田丰,易丛琴.FSVM方法在光纤围栏预警系统中的应用[J].信息通信,2015,28(12):75-77.
9左学武,李啸芳,李红菊.犹豫模糊信息集成应用于区域经济发展的研究[J].钦州学院学报,2016,31(10):75-80.
10汪洋,陈海燕,彭艳兵.模糊时间序列模型在论域划分上的研究[J].计算机与现代化,2015(11):22-26.

南京航空航天大学学报

2009年第4期

浏览历史

内容加载中请稍等...

少数类的集成学习被引量：1

参考文献2

二级参考文献5

共引文献14

同被引文献12

引证文献1

二级引证文献3

相关作者

相关机构

相关主题

浏览历史

少数类的集成学习 被引量：1

参考文献2

二级参考文献5

共引文献14

同被引文献12

引证文献1

二级引证文献3

相关作者

相关机构

相关主题

浏览历史

少数类的集成学习被引量：1