摘要
针对支持向量机中两类不平衡数据的分离超平面提出一种调整算法.首先用标准的支持向量机对原始数据进行初步训练,产生一个分离超平面的法向量.然后把高维样本投影到该法向量上得到一维数据.最后由投影数据的标准差以及样本容量所提供的信息,给出两类数据惩罚因子比例,再用标准的支持向量机进行第2次训练,从而得到一个新的分离超平面.实验显示该方法的有效性,即在一般情况下能平衡错分率,甚至还能减少错分率.
An adjustment method is proposed for the separation hyperplane of binary-classification imbalanced data. Firstly, the original samples are preliminarily trained by the standard support vector machines, and a normal vector of the separation hyperplane is obtained. Secondly, one-dimensional data are generated by projecting the high dimensional data onto the normal vector. Then, the ratio of the two-class penalty factors is determined based on the information derived from the standard deviation of the projective data and the two-class sample sizes. Finally, a new separation hyperplane is presented by the second training. Experimental results show the efficiency, i. e. , the two error ratios can be balanced and even be decreased generally.
出处
《模式识别与人工智能》
EI
CSCD
北大核心
2008年第2期136-141,共6页
Pattern Recognition and Artificial Intelligence
基金
国家自然科学基金资助项目(No.60574075)
关键词
不平衡数据
特征提取
支持向量机(SVM)
投影
标准偏差
Imbalanced Data, Feature Extraction, Support Vector Machines (SVM),Projection, Standard Deviation