摘要
提出一种改进随机子空间与C4.5决策树算法相结合的分类算法。以C4.5算法构建决策树作为集成学习的基分类器,每次迭代初始,将SMOTE采样技术与随机子空间方法相结合,生成在特征空间和数据分布上差异明显的合成样例,为基分类器提供多样化的平衡训练数据集,采用绝大多数投票方法进行最终决策的融合输出。实验结果表明,该方法对少数类和多数类均具有较高的识别率。
In this paper, a novel hybrid method of combination improved random subspace (RSM) method and C4.5 decision tree algorithm is proposed. The proposed method constructs decision tree with G4. 5 algorithm as a basic classifier, at the beginning of each iteration, just like in RSM, some features of the training data are removed, after removing a subset of the features, SMOTE is then applied to the dataset which is subsequently used to train the base classifier. In this way, a higher degree of variance and diversity training datasets for base" classifier are constructed. The fusion of decisions and the outputs are determined by the vast majority of votes. Experimental results show that the proposed method provides better classification performance than other approaches on both minority and majority classes, and is effective and feasible to deal with the imbalanced datasets.
出处
《佛山科学技术学院学报(自然科学版)》
CAS
2013年第5期22-26,共5页
Journal of Foshan University(Natural Science Edition)
基金
佛山市科技发展专项资金项目(2011AA100061)
佛山市产学研专项资金项目(2012HC100272)
佛山市教育局智能评价指标体系研究项目(DX20120220)
关键词
不平衡数据分类
随机子空间方法
决策树
集成学习
imbalanced data classification
random subspace method
decision tree
ensemble learning