摘要
增量学习是处理数据流的有效方式。文中针对已有增量分类算法只是作用于小规模数据集或者在集中式环境下进行的不足,提出了一种基于Hadoop云计算平台的增量分类模型,以解决大规模数据集的增量分类。该增量分类模型主要基于选择性集成学习思想,设计相应Map函数对不同时刻的增量样本块进行学习,以及设计Re-duce函数对不同时刻的分类器进行选择性集成以实现云计算平台上的增量学习。仿真实验表明该方法具有更好的性能,且能较好地解决数据流中的概念漂移问题。
Incremental learning is an effective way for data stream processing.To alleviate some issues about the current incremental learning algorithms,such as only for small-scale data sets or in a centralized environment,an incremental classification algorithm on Hadoop cloud computing platform is proposed to deal with large-scale data sets.Based on the idea of selective ensemble learning,Map function is designed to obtain base classifiers on incremental data blocks at different times,meanwhile,Reduce function is constructed to integrate different classifiers based on selective ensemble learning.The simulation results indicate that the proposed method can achieve higher performance than other ones and deal with concept drift problem in data stream.
出处
《南京邮电大学学报(自然科学版)》
北大核心
2012年第5期146-152,158,共8页
Journal of Nanjing University of Posts and Telecommunications:Natural Science Edition
基金
国家自然科学基金(61073114)
南京邮电大学攀登计划(NY210010)资助项目
关键词
增量分类
HADOOP
云计算
概念漂移
incremental classification
Hadoop
cloud computing
concept drift