摘要
集成学习主要分为串行和并行学习方法.并行学习的优势在于分类器的并行学习和融合,对分类问题通常采用的融合策略为投票法或堆叠学习法,它们的代表分别为随机森林和堆叠泛化Stacking.为了进一步提高Stacking的分类性能,在经典Stacking算法原理的基础上,提出基于随机森林的多阶段集成学习方法,以随机森林作为基层的基学习算法,以投票法和学习法同时作为融合方法,来降低泛化误差.在UCI数据集上的实验结果表明,提出的模型在Accuracy和1F指标上,相比Bagging,随机森林和Stacking等分类器在分类性能上有很大的提升.
Ensemble learning mainly includes serial and parallel learning method,the advantages of parallel learning are parallel execution and integration of classifiers.For classification task,the combination strategy have the voting scheme and the stack method,the representation of voting is random forest,while the representation of learning method is stacking.In order to further improve the classification performance of stacking,the multistage ensemble based on random forest is proposed,which is based on the basic principle of the classical stacked generalization.Random forest is the base learner for base-level,both the voting and learning methods are used as methods of integration to reduce the generalization error.The experiments on UCI data sets shows that the proposed algorithm in has a great improvement in classification performance in the Accuracy,F1,compared with Bagging,random forest and Stacking.
作者
徐慧丽
XU Hui-li(School of Mathematics,South China University of Technology,Guangzhou 510640,China)
出处
《高师理科学刊》
2018年第2期25-28,53,共5页
Journal of Science of Teachers'College and University