摘要
为减少不均衡数据对支持向量机分类性能的影响,提出一种基于二次支持向量机的欠取样分类算法,该算法依据样本的分类超平面贡献大小对多数类样本进行欠取样,并对少数类样本进行过取样,重构训练数据集。该算法能够删除样本中的噪声数据,用控制参数控制删除样本的规模,实验表明,该算法能够提高支持向量机在不均衡数据集下的分类性能。
In order to reduce the effect of imbalanced datacet on SVM classification performance, a new under-sampling algorithm based on the twice support vector machine is proposed for imbalanced data classification. For samples of majority class, this algorithm deletes the samples far from the classification hyperplane. And for samples of minority class, this algorithm use over-sampling algorithm to add new samples. The method may resolve the problem of imbalanced dataset and improve the classification performance of SVM. Experiment results with artificial dataset show the algorithm is effective for imbalanced dataset, especially for the minority class samples.
出处
《商洛学院学报》
2014年第4期38-41,61,共5页
Journal of Shangluo University
基金
商洛学院科研基金项目(13SKY024)
商洛学院教育教学改革研究项目(10JYJX02011)
关键词
支持向量机
不均衡数据
欠取样
分类超平面
Support Vector Machine
imbalanced dataset
under-sampling
classification hyperplane