期刊文献+

改进随机子空间与决策树相结合的不平衡数据分类方法

Imbalanced data classification improvement with combination of random subspace method and decision tree
下载PDF
导出
摘要 提出一种改进随机子空间与C4.5决策树算法相结合的分类算法。以C4.5算法构建决策树作为集成学习的基分类器,每次迭代初始,将SMOTE采样技术与随机子空间方法相结合,生成在特征空间和数据分布上差异明显的合成样例,为基分类器提供多样化的平衡训练数据集,采用绝大多数投票方法进行最终决策的融合输出。实验结果表明,该方法对少数类和多数类均具有较高的识别率。 In this paper, a novel hybrid method of combination improved random subspace (RSM) method and C4.5 decision tree algorithm is proposed. The proposed method constructs decision tree with G4. 5 algorithm as a basic classifier, at the beginning of each iteration, just like in RSM, some features of the training data are removed, after removing a subset of the features, SMOTE is then applied to the dataset which is subsequently used to train the base classifier. In this way, a higher degree of variance and diversity training datasets for base" classifier are constructed. The fusion of decisions and the outputs are determined by the vast majority of votes. Experimental results show that the proposed method provides better classification performance than other approaches on both minority and majority classes, and is effective and feasible to deal with the imbalanced datasets.
作者 胡小生
出处 《佛山科学技术学院学报(自然科学版)》 CAS 2013年第5期22-26,共5页 Journal of Foshan University(Natural Science Edition)
基金 佛山市科技发展专项资金项目(2011AA100061) 佛山市产学研专项资金项目(2012HC100272) 佛山市教育局智能评价指标体系研究项目(DX20120220)
关键词 不平衡数据分类 随机子空间方法 决策树 集成学习 imbalanced data classification random subspace method decision tree ensemble learning
  • 相关文献

参考文献12

二级参考文献94

  • 1张琦,吴斌,王柏.非平衡数据训练方法概述[J].计算机科学,2005,32(10):181-186. 被引量:10
  • 2韩慧,王路,温明,王文渊.不均衡数据集学习中基于初分类的过抽样算法[J].计算机应用,2006,26(8):1894-1897. 被引量:11
  • 3凌晓峰,SHENG Victor S..代价敏感分类器的比较研究(英文)[J].计算机学报,2007,30(8):1203-1212. 被引量:35
  • 4Chan P K, Stolfo S J. Toward scalable learning with nonuniform class and cost distributions: A case study in credit card fraud detection[C]// Proceedings of the Fourth International Conference on Knowledge Discovery and Data Mining. New York, USA: AAAI Press, 1998:164-168. 被引量:1
  • 5Phua C, Alahakoon D, Lee V. Minority report in fraud detection:Classification of skewed data[J]. SIGKDD Explore, 2004,6 (1) :50-59. 被引量:1
  • 6Sun Aixin, Lira E P, Liu Ying. On strategies for imbalaneed text classification using SVM: A comparative study[J]. Decision Support Systems, 2009,48 : 191-201. 被引量:1
  • 7Turney P D. Learning algorithms for keyphrase extraction[J]. Information Retrieval, 2000,2 (4) : 303-336. 被引量:1
  • 8Ling C X, Li C. Data mining for direct marketing: Problems and solutions[C] // Proceeding of the 4th International Conference on Knowledge Discovery and Data Mining. 1998:73-79. 被引量:1
  • 9Bauer E,Kohavi R. An empirical comparison of voting classication algoirthm: Bagging, boosting and variants [J]. Machine Learning, 1999,36 : 105-142. 被引量:1
  • 10Japkowicz N, Stephen S. The class imbalance problem: A systematic study[J]. Intelligent Data Analysis Journal, 2002,6 (5): 429-450. 被引量:1

共引文献127

相关作者

内容加载中请稍等...

相关机构

内容加载中请稍等...

相关主题

内容加载中请稍等...

浏览历史

内容加载中请稍等...
;
使用帮助 返回顶部