摘要
为了解决大数据环境下高维度稀疏的客户信用特征以及样本不平衡问题,从而提高客户的信用评估准确度,论文提出了基于RF-FL-LightGBM算法的信用风险评估模型。首先利用随机森林(RF)对高维数据进行重要性排序和筛选,剔除容易引起模型过度拟合和冗余无效的特征;其次将基于Focal Loss函数改进后的二分类平衡交叉嫡损失函数(FL)作为LightGBM模型的损失函数,以此改善正负样本不平衡导致模型准确度降低的情况,从而提高模型的分类性能。使用某金融租赁公司的历史客户数据集进行实验,结果表明,RF-FL-LightGBM模型的F1值、AUC值都明显高于XGBoost和LigthGBM模型。RF-FL-LightGBM算法不仅有效处理了高维稀疏不平衡样本数据,还提高了客户属性的分类精确度且执行效率更高。
In order to solve the problem of high-dimensional sparse customer credit characteristics and sample imbalance in the big data environment,thereby improving the accuracy of customer credit evaluation,this paper proposes a credit risk evaluation model based on the RF-FL-LightGBM algorithm.First,random forest(RF)is used to sort and filter the importance of high-dimen-sional features to eliminate features that easily lead to model overfitting and redundant uselessness.Secondly,the two-category bal-anced cross-straight loss function(FL)is improved based on the Focal Loss function.As the loss function of the LightGBM model to improve the model accuracy due to the positive and negative samples imbalance,thereby improving the model classification perfor-mance.Experiments use the historical customer data set of a financial leasing company.The results show that the F1-Score and AUC of the RF-FL-LightGBM model are significantly higher than the XGBoost and LigthGBM models.The RF-FL-LightGBM algo-rithm not only effectively processes high-dimensional sparse and unbalanced sample data,but also improves the customer attributes classification accuracy and has higher execution efficiency.
作者
苗月
吴陈
MIAO Yue;WU Chen(School of Computer Science,Jiangsu University of Science and Technology,Zhenjiang 212000)
出处
《计算机与数字工程》
2024年第3期808-813,共6页
Computer & Digital Engineering