通用集成学习算法的构造被引量：13

A Universal Ensemble Learning Algorithm

下载PDF

导出

摘要集成学习算法的构造属于机器学习领域的重要研究内容,尽管弱学习定理指出了弱学习算法与强学习算法是等价的,但如何构造好的集成学习算法仍然是一个未得到很好解决的问题.Freund和Schapire提出的AdaBoost算法和Schapire和Singer提出的连续AdaBoost算法部分解决了该问题.提出了一种学习错误定义,以这种学习错误最小化为目标,提出了一种通用的集成学习算法,算法可以解决目前绝大多数分类需求的学习问题,如多分类、代价敏感分类、不平衡分类、多标签分类、模糊分类等问题,算法还对AdaBoost系列算法进行了统一和推广.从保证组合预测函数的泛化能力出发,提出了算法中的简单预测函数可统一基于样本的单个特征来构造.理论分析和实验结论均表明,提出的系列算法的学习错误可以任意小,同时又不用担心出现过学习现象. The construction of ensemble learning algorithms is one of the important contents in machine learning area. The weak learning theorem proves that the weak learning algorithm is equal to the strong one essentially, but how to construct a good ensemble learning algorithm is still a problem to be studied. Freund and Schapire＇s AdaBoost boosting algorithm, and Schapire and Singer＇s real AdaBoost boosting algorithm partially solved this problem. A concept of learning error is defined. Based on it, aiming at the minimization of learning error, a universal ensemble learning algorithm is put forward. By using it, the learning error can decrease while increasing simple predictions. This universal ensemble learning algorithm can solve almost all classification problems, such as the multiclass classification problem, the cost-sensitive classification problem, the imbalanced classification problem, the multi-label classification problem, the fuzzy classification problem, etc. The universal ensemble learning algorithm can unify a series of AdaBoost algorithms and has generalized AdaBoost algorithm also. It is put forward that the simple prediction in all algorithms above can be constructed by single characteristics based on samples for the generalization ability of the ensemble prediction. Theoretical analysis and experimental conclusion all show that this universal ensemble learning algorithm can get any small learning error, and it is not easy to cause over-learning phenomenon.

作者付忠良

机构地区中国科学院成都计算机应用研究所

出处《计算机研究与发展》 EI CSCD 北大核心 2013年第4期861-872,共12页 Journal of Computer Research and Development

基金四川省科技支撑计划基金项目(2009SZ0214 2011GZ0171)

关键词集成学习机器学习 ADABOOST算法多分类问题泛化能力 ensemble learning machine learning AdaBoost algorithm multi-class classificationproblem generalization ability

分类号 TP391 [自动化与计算机技术—计算机应用技术]

引文网络
相关文献

参考文献20

1Freund Y, Schapire R E. A decision-theoretic generalization of on-line learning and an application to boosting [J]. Journal of Computer and System Sciences, 1997, 55(1) : 119-139. 被引量：1
2Valiant L G. A theory of learnable[J] Communications of the ACM, 1984, 27(11)z 1134-1142. 被引量：1
3Kearns M, Valiant L G. Learning boolean formulate or factoring, TR-1488 [R]. Cambridge, MA: Havard University Aiken Computation Laboratory, 1988. 被引量：1
4Kearns M, Valiant L G. Grytographie limitation on learning Boolean formulae and finite automata [C] //Proe of the 21st Annual ACM Syrup on Theory of Computing. New York: ACM, 1989: 433-444. 被引量：1
5Schapire R E, Singer Y. Improved boosting algorithms using confidence-rated predictions [J]. Machine Learning, 1999, 37(3) : 297-336. 被引量：1
6Friedman J, Hastie T, Tibshirani R. Additive logistic regression: A statistical view of boosting[J]. Annals of Statistics, 2000, 28(2): 337-374. 被引量：1
7Viola P, Jones M. Robust real-time face detection[J]. Int Journal of Computer Vision, 2004, 57(2) ： 137-154. 被引量：1
8梁路宏,艾海舟,徐光祐,张钹.人脸检测研究综述[J].计算机学报,2002,25(5):449-458. 被引量：355
9付忠良,赵向辉.分类器动态组合及基于分类器组合的集成学习算法[J].四川大学学报（工程科学版）,2011,43(2):58-65. 被引量：3
10Zhu J, Rosset S, Zou H, et al. Multi-class AdaBoost [J]. Statistics and Its Interface, 2009(2) : 349-360. 被引量：1

二级参考文献152

1燕继坤,郑辉,王艳,曾立君.基于可信度的投票法[J].计算机学报,2005,28(8):1308-1313. 被引量：8
2武勃,黄畅,艾海舟,劳世竑.基于连续Adaboost算法的多视角人脸检测[J].计算机研究与发展,2005,42(9):1612-1621. 被引量：66
3Valiant L G.A theory of the learnable[J].Communications of the ACM,1984,27(11):1134-1142. 被引量：1
4Schapire R E.The strength of weak learnability[J].Machine Learning,1990,5(2):197-227. 被引量：1
5Freund Y,Schapire R E.A decision-theoretic generalization of on-line learning and an application to boosting[J].Journal of Computer and System Sciences,1997,55(1):119-139. 被引量：1
6Breirnan L.Bagging predicators[J].Machine Learning,1996,24(2):123-140. 被引量：1
7Xu L,Krzyzak A,Suen C Y.Methods of combining multiple classifiers and their application to handwriting recognition[J].IEEE Trans on System,Man,and Cybernetics,1992,22(3):418-435. 被引量：1
8Perrone M P.Improving regression estimationn:Averaging methods for variance reduction with extensions to general convex measure optimization[D].Rhode Island,USA:Brown University,Department of Physics,1993. 被引量：1
9Fumera G,Roil F.A theoretical and experimental analysis of linear combiners for multiple classifier systems[J].IEEE Trans on Pattern Analysis and Machine Intelligence,2005,27(6):942-956. 被引量：1
10Kittler J,Hater M,Duin R,et al.On combining classifiers[J].IEEE Trans on Pattern Analysis and Machine Intelligence.1998,20(3):226-234. 被引量：1

共引文献487

1鄢丽娟,张彦虎.基于图像梯度补偿的人脸快速识别算法[J].计算机系统应用,2020,29(12):194-201. 被引量：1
2苏婕,于汝滨.基于年龄变化的人脸自动识别算法[J].计算机应用研究,2020,37(S02):383-385.
3乔丹,刘刚,杨执钧,钟韬,白雪.基于迁移学习的船舶目标识别[J].计算机应用研究,2020,37(S01):324-325. 被引量：1
4杨琳,管业鹏.基于肤色分割与Adaboost融合鲁棒人脸检测方法[J].电子器件,2007,30(5):1716-1719. 被引量：2
5刘远社,陈辉.火车客票联网预售订监控系统[J].西南民族大学学报（自然科学版）,2006,32(4):839-841.
6赵敏,舒俭.基于K-L变换的人脸识别系统[J].华东交通大学学报,2006,23(5):70-74. 被引量：3
7嵇新浩.基于NMF和LVQ神经网络的人脸识别[J].微电子学与计算机,2009,26(2):147-150. 被引量：1
8程俊红.基于级联分类器的人脸检测[J].硅谷,2008,1(10):19-19.
9赵磊,袁春婉.一种基于DiaPCA和概率神经网络的人脸识别方法[J].军事通信技术,2010,31(3):16-19. 被引量：1
10刘礼辉.基于Adaboost的快速人脸检测系统[J].科技风,2009(3):60-61.

同被引文献116

1于明进,孟祥录.瞬时转速在发动机性能与故障检测中的应用[J].山东交通学院学报,2004,12(2):18-21. 被引量：8
2姜远,周志华.基于词频分类器集成的文本分类方法[J].计算机研究与发展,2006,43(10):1681-1687. 被引量：22
3蔡振雄,李玩幽,李寒林.利用振动噪声信号诊断柴油机故障研究的现状与发展[J].船舶工程,2006,28(5):53-55. 被引量：9
4王路,张蕾,周彦军,曾晓云,孔俊.基于LVQ神经网络的植物种类识别[J].吉林大学学报（理学版）,2007,45(3):421-426. 被引量：20
5凌晓峰,SHENG Victor S..代价敏感分类器的比较研究(英文)[J].计算机学报,2007,30(8):1203-1212. 被引量：35
6Denning D E. An Intrusion-detection Model [ J ]. IEEE Transactions on Software Engineefi.ng, 1987,13 ( 2 ) :222 - 23.2. 被引量：1
7Forrest S, Hofmeyr S A, Somayaji A, et al. A Sense of Self for UnixPro- cesses[ C]//Proceedings of the 1996 IEEE Symposium on Research in Securityand Privacy, Los Alamitos, CA : IEEE Computer Society Press, 1QQ. IN -- 19R. 被引量：1
8Hall M, Frank E, Holmes G, et al. The WekaData Mining Software : An Update [ J ]. SIGKDD Explorations,2009,11 ( 1 ) : 10 - 18. 被引量：1
9SchapireR E.The strength of weak leamability[J].Ma- chine Learning,1990,5(2):197-227. 被引量：1
10FreundY,Schapire R E.A decision-theoretic generalization of on-line learning and an application to boosting[J].Journal of Computer and System Sciences,1997,55(1):119-139. 被引量：1

引证文献13

1付忠良.多标签代价敏感分类集成学习算法[J].自动化学报,2014,40(6):1075-1085. 被引量：23
2张伟,简刚.基于不均衡文本数据的集成分类方法设计[J].电信技术研究,2018,0(4):55-64.
3赵刚,宋健豪.基于系统调用时间特征的异常行为智能检测系统[J].计算机应用与软件,2015,32(4):309-313. 被引量：4
4张涛,陈万忠,李明阳.基于AdaBoost算法的癫痫脑电信号识别[J].物理学报,2015,64(12):419-425. 被引量：10
5付忠良,张丹普,王莉莉.多标签AdaBoost算法的改进算法[J].四川大学学报（工程科学版）,2015,47(5):103-109. 被引量：6
6宋伟,张帆,叶阳东,韩鹏,范明.基于SAX方法的时间序列分类问题的多阶段改进研究[J].计算机工程与科学,2016,38(5):988-996. 被引量：5
7赵捍东,马焱,张玮,张磊,李营,李旭东.舰艇对空中来袭目标意图的预判方法[J].中国舰船研究,2018,13(1):133-139. 被引量：5
8闫瑞姣,尹四清.选择性神经网络集成的微博用户信用评估模型[J].计算机工程与设计,2018,39(5):1478-1483. 被引量：6
9黄海松,魏建安,康佩栋.基于不平衡数据样本特性的新型过采样SVM分类算法[J].控制与决策,2018,33(9):1549-1558. 被引量：27
10鲁强,刘歆琦.基于RNN集成学习的个人轨迹恢复方法[J].计算机工程,2019,45(3):188-196. 被引量：2

二级引证文献111

1罗丹.一种基于多维高斯云模型的过采样方法[J].周口师范学院学报,2020(2):104-107. 被引量：1
2赵静,李俊,龙春,杜冠瑶,万巍,魏金侠.基于集成SVM和Bagging的未知恶意流量检测[J].计算机系统应用,2022,31(10):51-59. 被引量：3
3王若明.浅谈代价敏感学习[J].网络安全技术与应用,2020(3):52-54.
4胡佳利,王威娜.基于子类聚类和SAX表示的Shapelet快速发现算法[J].吉林化工学院学报,2022,39(11):20-24.
5秦喜文,郭宇,董小刚,郭佳静,袁迪.基于局部均值分解和迭代随机森林的脑电分类[J].吉林大学学报（信息科学版）,2020,38(1):64-71. 被引量：2
6杨明生,张春光,杨晓东.醒脑通腑液治疗急性期脑出血30例观察[J].实用中医药杂志,2000,16(2):6-6.
7张涛,陈万忠,李明阳.基于AdaBoost算法的癫痫脑电信号识别[J].物理学报,2015,64(12):419-425. 被引量：10
8孟芸,王喆.矩阵型多类代价敏感分类器模型[J].华东理工大学学报（自然科学版）,2016,42(1):119-124. 被引量：5
9王金婉,毛文涛,王礼云,何玲.基于主曲线的不均衡在线贯序极限学习机研究[J].计算机科学,2016,43(3):62-67.
10聂黎生,李欣,李小红.一种高效率的主动式漏洞挖掘平台[J].现代电子技术,2016,39(9):93-98. 被引量：1

1张瑞林.介绍CAD与CAPP系统三种不同的集成方法(二)[J].铁道机车车辆工人,1995(2):19-20.
2钟珞,戴远.城市隧道监控系统的集成设计[J].武汉理工大学学报,2010,32(15):119-122. 被引量：8
3文益民,李健,杜飞明,陈方.集成学习算法在不平衡分类中的应用研究[J].计算技术与自动化,2009,28(2):103-106.
4朱欣华.多机系统中双口RAM的构成方法及应用[J].测控技术,1996,15(2):44-46. 被引量：14
5李华雄,周献中,黄兵,赵佳宝.决策粗糙集与代价敏感分类[J].计算机科学与探索,2013,7(2):126-135. 被引量：11
6付忠良,张丹普,王莉莉.多标签AdaBoost算法的改进算法[J].四川大学学报（工程科学版）,2015,47(5):103-109. 被引量：6
7付忠良.多分类问题代价敏感AdaBoost算法[J].自动化学报,2011,37(8):973-983. 被引量：32
8郭丽娟,倪子伟,江弋,邹权.集成降采样不平衡数据分类方法研究[J].计算机科学与探索,2013,7(7):630-638. 被引量：3
9付忠良.不平衡多分类问题的连续AdaBoost算法研究[J].计算机研究与发展,2011,48(12):2326-2333. 被引量：17
10李勇,黄志球,房丙午,王勇.代价敏感分类的软件缺陷预测方法[J].计算机科学与探索,2014,8(12):1442-1451. 被引量：15

计算机研究与发展

2013年第4期

浏览历史

内容加载中请稍等...

通用集成学习算法的构造被引量：13

参考文献20

二级参考文献152

共引文献487

同被引文献116

引证文献13

二级引证文献111

相关作者

相关机构

相关主题

浏览历史

通用集成学习算法的构造 被引量：13

参考文献20

二级参考文献152

共引文献487

同被引文献116

引证文献13

二级引证文献111

相关作者

相关机构

相关主题

浏览历史

通用集成学习算法的构造被引量：13