摘要
针对传统机器学习分类算法处理高维个人信用数据时分类准确率较低的问题,提出一种基于皮尔森相关系数(PCC)和互信息法结合梯度提升决策树(MI-GBDT)的最优特征子集的选择方法,并应用在决策树、朴素贝叶斯分类器、支持向量机上。利用皮尔森相关系数去除强相关特征,利用互信息法和GBDT计算剩余特征的综合重要度,结合改进的基于特征排序的搜索策略,分别生成3种分类器模型所需的最优特征子集。实验结果表明,该方法在3种分类模型上筛选出的特征子集对应的分类精度分别提高了4.33%、13.29%和20.27%。
A feature selection method based on Pearson correlation coefficient(PCC)and mutual information method combined with gradient boosting decision tree(MI-GBDT)was presented to solve the problem that the traditional machine learning classi-fication algorithm has low classification accuracy when processing high-dimensional personal credit data.Pearson correlation coefficient was used to remove strong correlation features,and mutual information method and GBDT were used to calculate the comprehensive importance of the remaining features,and the optimal feature subsets corresponding to the three classifier models were generated combined with the improved search strategy based on feature sorting.Results show that the classification accu-racies of the feature subsets selected from the three classification models are improved by 4.33%,13.29% and 20.27% respectively.
作者
查志成
梁雪春
ZHA Zhi-cheng;LIANG Xue-chun(College of Electrical Engineering and Control Science,Nanjing Tech University,Nanjing 211816,China)
出处
《计算机工程与设计》
北大核心
2022年第6期1678-1685,共8页
Computer Engineering and Design
基金
国家自然科学基金青年基金项目(11801267)。