摘要
针对信用评价中最为常见的不平衡小样本数据集问题,以及不同误分类造成的损失代价不同问题,在传统SVM模型基础上,提出采用过采样的SMOTE算法解决数据不平衡问题。在核SVM模型的基础上运用交叉验证得出核最优参数,加入非对称误差成本(DEC),提高将高风险误分为低风险的成本,建立更适用于信用评价的模型。经数据验证,该算法有效弥补了传统SVM模型在不平衡数据集分类中的缺陷,避免了小样本数据集样本过少而使得模型泛化能力降低的问题。加入DEC之后的模型与未加入的相比,虽分类准确率略有降低,但将高风险误分为低风险的错误明显降低,更适用于信用评价模型。
Aiming at the commonest problem of unbalanced data set of credit scoring and the different cost caused by different classification error, based on the traditional kernel SVM model, we propose to use SMOTE to balance the unbalanced data. Cross validation is used to get the optimal parameters, and then dissymmetric error cost (DEC) is used to establish a more suit able model for credit scoring. Through the data test,it is proved that the new model remedies the defect of traditional SVM rood el and avoids the generalization ability decreasing caused by the small sample data set. Compared with the model without DEC, the accuracy of classification is slightly lower,but the error of high risk classification error is lower than before. It is more suit able for the credit scoring model.
作者
朱安安
ZHU An-an(College of Management,University of Shanghai for Science and Technology,Shanghai 200093,China)
出处
《软件导刊》
2018年第10期64-67,共4页
Software Guide