期刊文献+

非平衡概念漂移数据流主动学习方法

Active Learning Method for Imbalanced Concept Drift Data Stream
下载PDF
导出
摘要 数据流分类研究在开放、动态环境中如何提供更可靠的数据驱动预测模型,关键在于从实时到达且不断变化的数据流中检测并适应概念漂移.目前,为检测概念漂移和更新分类模型,数据流分类方法通常假设所有样本的标签都是已知的,这一假设在真实场景下是不现实的.此外,真实数据流可能表现出较高且不断变化的类不平衡比率,会进一步增加数据流分类任务的复杂性.为此,提出一种非平衡概念漂移数据流主动学习方法 (Active learning method for imbalanced concept drift data stream, ALM-ICDDS).定义基于多预测概率的样本预测确定性度量,提出边缘阈值矩阵的自适应调整方法,使得标签查询策略适用于类别数较多的非平衡数据流;提出基于记忆强度的样本替换策略,将难区分、少数类样本和代表当前数据分布的样本保存在记忆窗口中,提升新基分类器的分类性能;定义基于分类精度的基分类器重要性评价及更新方法,实现漂移后的集成分类器更新.在7个合成数据流和3个真实数据流上的对比实验表明,提出的非平衡概念漂移数据流主动学习方法的分类性能优于6种概念漂移数据流学习方法. Data stream classification researchs how to provide more reliable data-driven prediction models in open and dynamic environment.The key is how to detect and adapt to concept drift from continuously changing data stream that arrive in real-time.Currently,in order to detect concept drift and update classification models,data stream classification methods usually assume that the labels of all samples are known,which is unrealistic in real scenarios.Additionally,real data stream may exhibit a high and constantly changing class imbalance ratios,further increasing the complexity of the data stream classification task.In this paper,we propose an active learning method for imbalanced concept drift data stream(ALM-ICDDS).Firstly,we define a sample prediction certainty measure based on multiple prediction probabilities and propose an adaptive adjustment method for the margin threshold matrix,which makes the label query strategy suitable for imbalanced data stream with a number of categories.Then,we propose a sample replacement strategy based on memory strength,which saves the samples that are difficult-to-distinguish,minority class and represent the current data distribution in the memory window,and improves the classification performance of new base classifier.Finally,we define the importance evaluation and update method of base classifier based on classification accuracy,which realizes the ensemble classifier update after drift.Comparative experiments on seven synthetic data streams and three real data streams show that the active learning method for imbalance concept drift data stream is better than six concept drift data stream learning methods in classification performance.
作者 李艳红 王甜甜 王素格 李德玉 LI Yan-Hong;WANG Tian-Tian;WANG Su-Ge;LI De-Yu(School of Computer and Information Technology,Shanxi University,Taiyuan 030006;Key Laboratory of Computational Intelligence and Chinese Information Processing of Ministry of Education,Shanxi University,Taiyuan 030006)
出处 《自动化学报》 EI CAS CSCD 北大核心 2024年第3期589-606,共18页 Acta Automatica Sinica
基金 国家重点研发项目(2022QY0300-01) 国家自然科学基金(62076158) 山西省基础研究计划项目(202203021221001)资助。
关键词 数据流分类 主动学习 概念漂移 多类不平衡 Data stream classification active learning concept drift multi-class imbalance
  • 相关文献

参考文献3

二级参考文献16

共引文献24

相关作者

内容加载中请稍等...

相关机构

内容加载中请稍等...

相关主题

内容加载中请稍等...

浏览历史

内容加载中请稍等...
;
使用帮助 返回顶部