摘要
现有概念漂移处理算法在检测到概念漂移发生后,通常需要在新到概念上重新训练分类器,同时"遗忘"以往训练的分类器。在概念漂移发生初期,由于能够获取到的属于新到概念的样本较少,导致新建的分类器在短时间内无法得到充分训练,分类性能通常较差。进一步,现有的基于在线迁移学习的数据流分类算法仅能使用单个分类器的知识辅助新到概念进行学习,在历史概念与新到概念相似性较差时,分类模型的分类准确率不理想。针对以上问题,文中提出一种能够利用多个历史分类器知识的数据流分类算法——CMOL。CMOL算法采取分类器权重动态调节机制,根据分类器的权重对分类器池进行更新,使得分类器池能够尽可能地包含更多的概念。实验表明,相较于其他相关算法,CMOL算法能够在概念漂移发生时更快地适应新到概念,显示出更高的分类准确率。
The existing algorithms for classification of data streams with concept drift always train a new classifier on new collected data when new concept is detected,and forget the historical models.This strategy always lead to insufficient training of classifier in a short time,because the training data for the new concept are always not collected enough in initial stage.And further,some existing online transfer learning algorithms for classification of data streams with concept drift only take advantage of single source domain,which sometimes lead to poor classification accuracy when the historical concepts are different with the new concept.Aiming to solve these problems above,this paper proposed a multi-source online transfer learning algorithms for classification of data stream with concept drift(CMOL),which can utilize the knowledges from multiple historical classifiers.The CMOL algorithm adopts a dynamic classifier weight adjustment mechanism and updates classifier pool according to the weights of classifiers in it.Experiments validate that CMOL can adapt to new concept faster than other corresponding methods when concept drift occurs,and get higher classification accuracy.
作者
秦一休
文益民
何倩
QIN Yi-xiu;WEN Yi-min;HE Qian(School of Computer Science and Information Security,Guilin University of Electronic Technology,Guilin,Guangxi 541004,China;Guangxi Key Laboratory of Trustworthy Software,Guilin University of Electronic Technology,Guilin,Guangxi 541004,China)
出处
《计算机科学》
CSCD
北大核心
2019年第1期64-72,共9页
Computer Science
基金
国家自然科学基金(61363029
61866007)
广西区自然科学基金(2018GXNSFDA138006)
广西可信软件重点实验室立项资助课题(KX201721)
广西高校图像图形智能处理重点实验室课题资助项目(GIIP201505)
广西云计算与大数据协同创新中心项目(YD16E12)资助
关键词
多源迁移学习
在线学习
概念漂移
数据流分类
Multi-source transfer learning
Online learning
Concept drift
Data stream classification