摘要
传统集成分类算法中,一般将集成数目设置为固定值,这可能会导致较低分类准确率。针对这一问题,提出了准确率爬坡集成分类算法(C-ECA)。首先,该算法不再用一些基分类器去替换相同数量的表现最差的基分类器,而是基于准确率对基分类器进行更新,然后确定最佳集成数目。其次,在C-ECA的基础上提出了基于爬坡的动态加权集成分类算法(C-DWECA)。该算法提出了一个加权函数,其在具有不同特征的数据流上训练基分类器时,可以获得基分类器的最佳权值,从而提升集成分类器的性能。最后,为了能更早地检测到概念漂移并提高最终精度,采用了快速霍夫丁漂移检测方法(FHDDM)。实验结果表明C-DWECA的准确率最高可达到97.44%,并且该算法的平均准确率比自适应多样性的在线增强(ADOB)算法提升了40%左右,也优于杠杆装袋(LevBag)、自适应随机森林(ARF)等其他对比算法。
In the traditional ensemble classification algorithm,the ensemble number is generally set to a fixed value,which may lead to a low classification accuracy.Aiming at this problem,an accuracy Climbing Ensemble Classification Algorithm(C-ECA)was proposed.Firstly,the base classifiers was no longer replaced the same number of base classifiers with the worst performance,but updated based on the accuracy in this algorithm,and then the optimal ensemble number was determined.Secondly,on the basis of C-ECA,a Dynamic Weighted Ensemble Classification Algorithm based on Climbing(C-DWECA)was proposed.When the base classifier was trained on the data stream with different features,the best weight of the base classifier was able to be obtained by a weighting function proposed in this algorithm,thereby improving the performance of the ensemble classifier.Finally,in order to detect the concept drift earlier and improve the final accuracy,Fast Hoffding Drift Detection Method(FHDDM)was adopted.Experimental results show that the accuracy of C-DWECA can reach up to 97.44%,and the average accuracy of the proposed algorithm is about 40%higher than that of Adaptable Diversity-based Online Boosting(ADOB)algorithm,and is also better than those of other comparison algorithms such as Leveraging Bagging(LevBag)and Adaptive Random Forest(ARF).
作者
李小娟
韩萌
王乐
张妮
程浩东
LI Xiaojuan;HAN Meng;WANG Le;ZHANG Ni;CHENG Haodong(School of Computer Science and Engineering,North Minzu University,Yinchuan Ningxia 750021,China)
出处
《计算机应用》
CSCD
北大核心
2022年第1期123-131,共9页
journal of Computer Applications
基金
国家自然科学基金资助项目(62062004)
宁夏自然科学基金资助项目(2020AAC03216)。
关键词
集成学习
分类
数据流
动态加权
集成数目
准确率
爬坡
ensemble learning
classification
data stream
dynamic weighting
ensemble number
accuracy
climbing