摘要
大数据时代下,用户信用数据中的高维稀疏特征与样本不平衡现象日益显著。为处理高维特征,文中采用随机森林(RF)从Filter和Wrapper角度进行特征提取,并用SMOTE算法对训练集样本做采样处理。模型训练阶段使用粒子群优化算法对XGboost模型做分类精度提高。最后,采用一开源银行数据集提供的数据进行实例验证。结果表明,相较于一般的GBDT模型和网格搜索法,所建立的模型在评估时具有更好的精度与收敛性。
In the era of big data,the imbalanced phenomenon between high⁃dimensional sparse features and samples in user credit data is increasingly obvious.In order to deal with high⁃dimensional features,RF(random forest)is used in this paper to extract the features by Filter and Wrapper methodes,and SMOTE algorithm is used to perform sampling processing of the training set samples.In the model training stage,particle swarm optimization algorithm is used to improve the classification accuracy of XGboost model.The data provided by Xiamen International Bank is used for example verification.The results show that,in comparison with the common GBDT model and grid search method,the model established in this paper has better accuracy and convergence in evaluation direction.
作者
张雷
王家琪
费职友
罗帅
隋京岐
ZHANG Lei;WANG Jiaqi;FEI Zhiyou;LUO Shuai;SUI Jingqi(School of Mathematics and Statistics,Chongqing Jiaotong University,Chongqing 400074,China;School of Economics and Management,Chongqing Jiaotong University,Chongqing 400074,China;School of Information Science and Engineering,Chongqing Jiaotong University,Chongqing 400074,China)
出处
《现代电子技术》
北大核心
2020年第16期76-81,共6页
Modern Electronics Technique
基金
国家自然科学基金项目(11401061)
国家自然科学基金项目(11501065)
重庆市教委项目(KJ1600504,KJ1600512)。