摘要
针对训练数据集的不均衡性这一问题,结合采样方法和集成方法,提出一种集成支持向量机分类算法。该算法首先对不均衡的训练集进行非监督聚类;然后依靠其底层的局部关注支持向量机进行数据集局部划分,以精确把控数据集间的局部特征;最后通过顶层支持向量机进行分类预测。在UCI数据集上的评测结果显示,该算法与当前流行的算法(如基于采样的核化少数类过采样技术(K-SMOTE)、基于集成的梯度提升决策树(GTB)和代价敏感集成算法(Ada Cost)等)相比,分类效果有明显提升,能在一定程度上解决数据集的不均衡问题。
Aiming at the imbalance of training data set,an integrated support vector machine classification algorithm was proposed by combining sampling method with ensemble method.Firstly,unsupervised clustering was performed on an unbalanced training set,then the underlying local attention support vector machine was used to partition the data set so as to precisely control the local features of data sets.FinaQy,top support vector machine was used to predicte classification.The evaluation results on UCI dataset show that compared with the popular algorithms such as sampling based Kemelized Synthetic Minority Over-sampling TEchnique(K-SMOTE),integration based Gradient Tree Boosting(GTB)and cost sensitive ensemble algorithm(AdaCost),the proposed support vector machine algorithm can significantly improve the classification effect and solve the problem of unbalanced data set to a certain extent.
作者
周于皓
张红玲
李芳菲
祁鹏
ZHOU Yuhao;ZHANG Hongling;LI Fangfei;QI Peng(College of Petroleum Engineering,China University of Petroleum,Beijing 102249,China;School of Media and Communication,Wuhan Textile University,Wuhan Hubei 430000,China)
出处
《计算机应用》
CSCD
北大核心
2018年第4期945-948,954,共5页
journal of Computer Applications
关键词
非均衡数据集
支持向量机
集成算法
非监督聚类
unbalanced data set
Support Vector Machine(SVM)
ensemble algorithm
unsupervised clustering