摘要
在机器学习领域,弱学习定理指明只要能够寻找到比随机猜测略好的弱学习算法,则可以通过一定方式,构造出任意误差精度的强学习算法.基于该理论下最常用的方法有AdaBoost和Bagging.AdaBoost和Bagging的误差分析还不统一;AdaBoost使用的训练误差并不是真正的训练误差,而是基于样本权值的一种误差,是否合理需要解释;确保AdaBoost有效的条件也需要有直观的解释以便使用.在调整Bagging错误率并采取加权投票法后,对AdaBoost和Bagging的算法流程和误差分析进行了统一,在基于大数定理对弱学习定理进行解释与证明基础之上,对AdaBoost的有效性进行了分析.指出AdaBoost采取的样本权值调整策略其目的是确保正确分类样本分布的均匀性,其使用的训练误差与真正的训练误差概率是相等的,并指出了为确保AdaBoost的有效性在训练弱学习算法时需要遵循的原则,不仅对AdaBoost的有效性进行了解释,还为构造新集成学习算法提供了方法.还仿照AdaBoost对Bagging的训练集选取策略提出了一些建议.
Weak learning theorem in machine learning area shows that if the weak learning algorithm slightly better than random guess can be found, the strong learning algorithm with any precision can be constructed. AdaBoost and Bagging are the methods most in use based on this theorem. But many problems about AdaBoost and Bagging have not been well solved. The error analyses of AdaBoost and Bagging are not uniformed; The training errors used in AdaBoost are not the real training errors, but the errors based on sample weights, and if they can represent the real training errors, explanation is needed; The conditions for assuring the effectiveness of final strong learning algorithm also needs to be explained. After adjusting the error rate of Bagging and adopting weighted voting method, the algorithm flows and error analyses of AdaBoost and Bagging are unified. By direct graph analysis, how weak learning algorithm is promoted to strong learning algorithm is explained. Based on the explanation and proof of large number law to weak learning theorem, the effectiveness of AdaBoost is analyzed. The sample weight adjustment strategy of AdaBoost is used to assure the uniform distribution of correct samples. Its probabilities of training errors are equal in probability to that of the real training errors. The rules for training weak learning algorithm are proposed to assure the effectiveness of AdaBoost. The effectiveness of AdaBoost is explained, and the methods for constructing new integrated learning algorithms are given. Some suggestions about the selection strategy of training set in Bagging are given by consulting AdaBoost.
出处
《计算机研究与发展》
EI
CSCD
北大核心
2008年第10期1747-1755,共9页
Journal of Computer Research and Development
基金
中国科学院西部之光人才培养基金项目