期刊文献+

概念漂移数据流分类中的多源在线迁移学习算法 被引量:3

Multi-source Online Transfer Learning Algorithm for Classification of Data Streams with Concept Drift
下载PDF
导出
摘要 现有概念漂移处理算法在检测到概念漂移发生后,通常需要在新到概念上重新训练分类器,同时"遗忘"以往训练的分类器。在概念漂移发生初期,由于能够获取到的属于新到概念的样本较少,导致新建的分类器在短时间内无法得到充分训练,分类性能通常较差。进一步,现有的基于在线迁移学习的数据流分类算法仅能使用单个分类器的知识辅助新到概念进行学习,在历史概念与新到概念相似性较差时,分类模型的分类准确率不理想。针对以上问题,文中提出一种能够利用多个历史分类器知识的数据流分类算法——CMOL。CMOL算法采取分类器权重动态调节机制,根据分类器的权重对分类器池进行更新,使得分类器池能够尽可能地包含更多的概念。实验表明,相较于其他相关算法,CMOL算法能够在概念漂移发生时更快地适应新到概念,显示出更高的分类准确率。 The existing algorithms for classification of data streams with concept drift always train a new classifier on new collected data when new concept is detected,and forget the historical models.This strategy always lead to insufficient training of classifier in a short time,because the training data for the new concept are always not collected enough in initial stage.And further,some existing online transfer learning algorithms for classification of data streams with concept drift only take advantage of single source domain,which sometimes lead to poor classification accuracy when the historical concepts are different with the new concept.Aiming to solve these problems above,this paper proposed a multi-source online transfer learning algorithms for classification of data stream with concept drift(CMOL),which can utilize the knowledges from multiple historical classifiers.The CMOL algorithm adopts a dynamic classifier weight adjustment mechanism and updates classifier pool according to the weights of classifiers in it.Experiments validate that CMOL can adapt to new concept faster than other corresponding methods when concept drift occurs,and get higher classification accuracy.
作者 秦一休 文益民 何倩 QIN Yi-xiu;WEN Yi-min;HE Qian(School of Computer Science and Information Security,Guilin University of Electronic Technology,Guilin,Guangxi 541004,China;Guangxi Key Laboratory of Trustworthy Software,Guilin University of Electronic Technology,Guilin,Guangxi 541004,China)
出处 《计算机科学》 CSCD 北大核心 2019年第1期64-72,共9页 Computer Science
基金 国家自然科学基金(61363029 61866007) 广西区自然科学基金(2018GXNSFDA138006) 广西可信软件重点实验室立项资助课题(KX201721) 广西高校图像图形智能处理重点实验室课题资助项目(GIIP201505) 广西云计算与大数据协同创新中心项目(YD16E12)资助
关键词 多源迁移学习 在线学习 概念漂移 数据流分类 Multi-source transfer learning Online learning Concept drift Data stream classification
  • 相关文献

参考文献7

二级参考文献212

  • 1Folino G, Pizzuti C, Spezzano G. An adaptive distributed ensemble approach to mine concept-drifting data streams [C]//Proc of the 19th IEEE Int Conf on Tools with Artificial Intelligence. Piseataway, NJ: IEEE, 2007:183-188. 被引量:1
  • 2Wang Haixun, Fan Wei, Yu P S, et al. Mining concept- drifting data streams using ensemble elassifiers[C] //Proe of the 9th ACM SIGKDD Int Conf on Knowledge Discovery and Data Mining. New York: ACM, 2003:226-235. 被引量:1
  • 3Tsymbal A. The problem of concept drift: Definitions and related work, TCD-CS-2004-15 [R]. Dublin, Ireland.. Department of Computer Science, Trinity College, 2004. 被引量:1
  • 4Hulten G, Spencer L, Domingos P. Mining time-changing data streams[C]//Proc of the 7th ACM SIGKDD Int Conf on Knowledge Discovery and Data Mining. New York: ACM, 2001:97-106. 被引量:1
  • 5Babcock B, Babu S, Datar M, et al. Models and issues in data stream systems[C] //Proc of the 21st ACM SIGACT- SIGMOD-SIGART Syrup on Principles of Database Systems. New York: ACM, 2002:1-16. 被引量:1
  • 6Widmer G, Kubat M. Learning in the presence of concept drift and hidden contexts[J]. Machine Learning, 1996, 23 (1) : 69-101. 被引量:1
  • 7Domingos P, Hulten G. Mining high-speed data streams[C] //Proc of the 6th ACM SIGKDD Int Conf on Knowledge Discovery and Data Mining. New York: ACM, 2000:71-80. 被引量:1
  • 8Gama J, Rocha R, Medas P. Accurate decision trees for mining high-speed data streams[C] //Proc of the 9th ACM SIGKDD Int Conf on Knowledge Discovery and Data Mining. New York: ACM, 2003:523-528. 被引量:1
  • 9Gama J, Medas P, Rocha R. Forest trees for on-line data[C] //Proc of the 19th ACM Symp on Applied Computing. New York: ACM, 2004:632-636. 被引量:1
  • 10Gama J, Castillo G. Learning with local drift detection[G]// LNAI 4093: Proe of the 2nd Inf Conf on Advanced Data Mining and Applieations. Berlin: Springer, 2006:42-55. 被引量:1

共引文献528

同被引文献11

引证文献3

相关作者

内容加载中请稍等...

相关机构

内容加载中请稍等...

相关主题

内容加载中请稍等...

浏览历史

内容加载中请稍等...
;
使用帮助 返回顶部