摘要
传统的支持向量机在处理不平衡数据时效果不佳。为了提高少类样本的识别精度,提出了一种基于支持向量的上采样方法。首先根据K近邻的思想清除原始数据集中的噪声;然后用支持向量机对训练集进行学习以获得支持向量,进一步对少类样本的每一个支持向量添加服从一定规律的噪声,增加少数类样本的数目以获得相对平衡的数据集;最后将获得的新数据集用支持向量机学习。实验结果显示,该方法在人工数据集和UCI标准数据集上均是有效的。
Traditional support vector machine has drawbacks in dealing with imbalanced data. In order to improve the recognition accuracy of the minority class, an over-sampling method based on support vector was proposed. Firstly, K nearest neighbor technology is used to remove the noise from the original data set. Support vector machine learning is then used to obtain the support vector. Noise obeying a certain rule is added to each support vectors of the minority class to increase the number of minority class samples in order to obtain the relative balanced data set. Finally, the sup- port vector machine is learned on the new data set. The experimental results show that the proposed method is effective on both artificial data sets and UCI standard data sets.
出处
《计算机科学》
CSCD
北大核心
2016年第12期97-100,共4页
Computer Science
基金
广东省特色创新类项目(2015KTSCX143)
广东省青年创新人才项目(2015KQN CX172)
江门市科技计划项目(江科[2016]189号
江科[2015]138号)
五邑大学青年基金(2013zk07
2015zk11)资助
关键词
支持向量
采样
不平衡数据
分类
Support vector, Sampling, Imbalanced data, Classification