期刊文献+

基于支持向量上采样的不平衡数据分类方法 被引量:4

Imbalanced Data Classification Method Based on Support Vector Over-sampling
下载PDF
导出
摘要 传统的支持向量机在处理不平衡数据时效果不佳。为了提高少类样本的识别精度,提出了一种基于支持向量的上采样方法。首先根据K近邻的思想清除原始数据集中的噪声;然后用支持向量机对训练集进行学习以获得支持向量,进一步对少类样本的每一个支持向量添加服从一定规律的噪声,增加少数类样本的数目以获得相对平衡的数据集;最后将获得的新数据集用支持向量机学习。实验结果显示,该方法在人工数据集和UCI标准数据集上均是有效的。 Traditional support vector machine has drawbacks in dealing with imbalanced data. In order to improve the recognition accuracy of the minority class, an over-sampling method based on support vector was proposed. Firstly, K nearest neighbor technology is used to remove the noise from the original data set. Support vector machine learning is then used to obtain the support vector. Noise obeying a certain rule is added to each support vectors of the minority class to increase the number of minority class samples in order to obtain the relative balanced data set. Finally, the sup- port vector machine is learned on the new data set. The experimental results show that the proposed method is effective on both artificial data sets and UCI standard data sets.
作者 曹路
出处 《计算机科学》 CSCD 北大核心 2016年第12期97-100,共4页 Computer Science
基金 广东省特色创新类项目(2015KTSCX143) 广东省青年创新人才项目(2015KQN CX172) 江门市科技计划项目(江科[2016]189号 江科[2015]138号) 五邑大学青年基金(2013zk07 2015zk11)资助
关键词 支持向量 采样 不平衡数据 分类 Support vector, Sampling, Imbalanced data, Classification
  • 相关文献

参考文献3

二级参考文献66

  • 1WU Xin-dong,KUMAR V,QUINLAN J R,et al.Top 10 algorithms in data mining[J].Knowledge and Information Systems,2008,14(1):1-37. 被引量:1
  • 2CHAWLA N V,JAPKOWICZ N,KOTCZ A.Editorial:special issue on learning from imbalanced data sets[J].ACM SIGKDD Explorations Newsletter,2004,6(1):1-6. 被引量:1
  • 3HE Hai-bo,GARCIA E A.Learning from imbalanced data[J].IEEE Trans on Knowledge and Data Engineering,2009,21(9):1263-1284. 被引量:1
  • 4TING K M.A comparative study of cost-sensitive boosting algorithms[C]//Proc of the 17th International Conference on Machine Learning.2000:983-990. 被引量:1
  • 5FAN Wei,STOLFO S J,ZHANG Jun-xin,et al.AdaCost:misclassification cost-sensitive boosting[C]//Proc of the 16th International Conference on Machine Learning.1999:97-105. 被引量:1
  • 6SUN Yan-min,KAMEL M S,WONG A K C,et al.Cost-sensitive boosting for classification of imbalanced data[J].Pattern Recognition,2007,40(12):3358-3378. 被引量:1
  • 7GALAR M,FERNNDEZ A,BARRENCHEA E,et al.EUSBoost:enhancing ensembles for highly imbalanced data-sets by evolutionary undersampling[J].Pattern Recognition,2013,46(12):3460-3471. 被引量:1
  • 8JOSHI M V,KUMAR V,AGARWAL R C.Evaluating boosting algorithms to classify rare classes:comparison and improvements[C]//Proc of IEEE International Conference on Data Mining.Washington DC:IEEE Computer Society,2001:257-264. 被引量:1
  • 9GUO Hong-yu,VIKTOR H L.Learning from imbalanced data sets with boosting and data generation:the DataBoost-IM approach[J].SIGKDD Exploration Newsletter,2004,6(1):30-39. 被引量:1
  • 10FREUND Y,SCHAPIRE R.A desicion-theoretic generalization of on-line learning and an application to boosting[J].Journal of Computer & System Sciences,1997,55(1):119-139. 被引量:1

共引文献80

同被引文献23

引证文献4

二级引证文献3

相关作者

内容加载中请稍等...

相关机构

内容加载中请稍等...

相关主题

内容加载中请稍等...

浏览历史

内容加载中请稍等...
;
使用帮助 返回顶部