摘要
鉴于在数据流中无法一次性收集完整的训练集,同时数据可能会处于不平衡状态并夹杂概念漂移而影响分类性能,提出一种在线动态集成选择的不平衡漂移数据流Boosting分类算法。该算法采用多种平衡措施,使用泊松分布对数据流进行重采样,如果数据处于高度不平衡状态则采用存储少数类的窗口进行二次采样以达到当前数据平衡。为了提高算法的处理效率,提出分类器选择集成策略动态调整分类器数目,算法运行过程使用自适应窗口检测器检测概念漂移。试验结果表明,该算法在一定程度上提高了少数类的真阳性率和运行效率,可以对带有概念漂移的不平衡数据流有较好的分类性能。
In view of the fact that the complete training set could not be collected at one time in the data stream and the data would be in an imbalanced state with concept drift,which would affect the classification performance,an online Boosting classification al-gorithm for imbalanced drift data stream based on dynamic ensemble selection(BCA-DES)was proposed.In this algorithm,a vari-ety of balancing measures are adopted,Poisson distribution was used to resampling the data stream,and if the data was in a highly imbalanced state,minority classes of window were used for secondary sampling to achieve the current data balance.In order to im-prove the efficiency of the algorithm,a classifier selection ensemble strategy was proposed to dynamically adjust the number of clas-sifiers.Adaptive windowing detector was used to detect concept drift during the algorithm operation.The experimental results showed that the algorithm improved the true positive rate and operation efficiency of a minority classes to a certain extent and had better clas-sification performance for imbalanced data streams with concept drift.
作者
张喜龙
韩萌
陈志强
武红鑫
李慕航
ZHANG Xilong;HAN Meng;CHEN Zhiqiang;WU Hongxin;LI Muhang(School of Computer Science and Engineering,North Minzu University,Yinchuan 750021,Ningxia,China)
出处
《山东大学学报(工学版)》
CAS
CSCD
北大核心
2023年第4期83-92,共10页
Journal of Shandong University(Engineering Science)
基金
国家自然科学基金资助项目(62062004)
宁夏自然科学基金资助项目(2020AAC03216,2022AAC03279)
北方民族大学研究生创新项目(YCX21085)。