摘要
为解决传统信用风险预测模型的非均衡样本识别不足问题,利用过采样方法和机器学习算法,提升信用债违约预测模型的准确率及稳定性。引入盈利能力、现金流量、营运能力、资本结构、偿债能力5类财务指标和非财务指标,运用SMOTE、Borderline SMOTE、ADASYN方法解决样本不均衡问题,通过逻辑回归、支持向量机、随机森林、XGBoost进行风险识别。结论:对于非均衡信用债违约样本,1000次有放回bootstrap重复抽样下ADASYN-RF模型的AUC、Recall优于LR、SVM和RF模型;ADASYN-SVM模型违约样本实际Recall较不使用过采样法提升36.86个百分点。引入可解释性机器学习方法,发现带息债务/全部投入资本、地方财政收入/债务存量、资产负债率等是信用债违约的重要影响因素。
In order to solve the problem of insufficient identification of unbalanced samples in traditional credit risk prediction models,oversampling methods and machine learning algorithms are used to improve the accuracy and stability of credit default prediction models.Introduce 5 financial and non-financial indicators of profitability,cash flow,operating capacity,capital structure,and solvency,and use SMOTE,Borderline SMOTE,and ADASYN methods to solve the problem of sample imbalance.Through logistic regression,support vector machines,random forests,XGBoost conducts risk identification.Conclusion:For non-equilibrium credit debt default samples,the AUC and Recall of the ADASYN-RF model under 1000repeated sampling with replacement bootstrap are better than those of LR,SVM and RF models;the actual Recall of ADASYN-SVM model default samples is less than the oversampling method increased by 36.86 percentage points.Introducing interpretable machine learning methods,it is found that interest-bearing debt/full invested capital,local fiscal revenue/debt stock,asset-liability ratio,etc.are important factors affecting credit default.
作者
徐舒玥
曹艳华
XU Shu-yue;CAO Yan-hua
出处
《科学决策》
CSSCI
2023年第5期190-200,共11页
Scientific Decision Making