摘要
不平衡数据分类是机器学习研究的热点问题,传统分类算法假定不同类别具有平衡分布或误分代价相同,难以得到理想的分类结果.提出一种基于加权聚类质心的SVM分类方法,在正负类样本上分别进行聚类,对每个聚类,用聚类质心和权重因子代表聚类内样本分布和数量,相等类别数量的质心和权重因子参与SVM模型训练.实验结果表明,该方法使模型的训练样本具有较高的代表性,分类性能与其他采样方法相比得到了提升.
Classification of imbalanced data has become a research hot topic in machine learning. Traditional classi- fication algorithms assume that different classes have balanced distribution or equal misclassification cost, thus, making it hard to get ideal result of classifications. A support vector machine ( SVM) classification method based on weighted clustering centroid was proposed in this paper. First, unsupervised clustering was applied to the positive and negative samples respectively to extract the clustering centroid of each clustering, which was represented the most in compactness of the clustering sample. Next, all clustering centroids formed a new set of balance training. In order to minimize the information loss during clustering, each clustering centroid was associated with a weight factor that was defined proportional to the number of samples of the class. Finally, all clustering centroids and weight fac- tors participated in the training of the improved SVM model. Experimental results show that the proposed method can make the sample selected from model train sets more typical and improve the classification performance better than other sampling techniques for dealing with imbalanced data.
出处
《智能系统学报》
CSCD
北大核心
2013年第3期261-265,共5页
CAAI Transactions on Intelligent Systems
基金
佛山市科技发展专项资金资助项目(2011AA100061)
佛山市产学研专项资金资助项目(2012HC100272)
佛山市教育局智能评价指标体系研究项目(DX20120220)
关键词
机器学习
不平衡数据分类
聚类质心
支持向量机
machine learning
imbalanced data classification
clustering centroid
support vector machine