摘要
针对当前校园扶贫工作普遍存在的"隐形贫困"及疑似"虚假贫困"等问题,提出了利用随机森林算法和决策树的贫困生认定方法 .首先,利用智慧校园大数据环境获取高校内学生的基本信息、消费信息等数据,找出10个具有分类能力的特征;然后使用基于permutation随机置换的残差均方减小量来衡量对于变量重要性评分;最后,基于随机森林算法和决策树进行判别与分类.实验结果表明,提出的方法具有一定的准确性,相比Adaboost方法,无论是预测准确度还是平均绝对误差,随机森林方法都更出色.
Aiming at the problems of “invisible poverty” and suspected “false poverty” that exist in the current campus poverty alleviation work,a method of identifying poor students using random forest algorithm and decision tree is proposed.First,use the smart campus big data environment to obtain basic information,consumption information and other data of students in colleges and universities,and find out 10 features with classification ability.Then,use the residual mean square reduction based on permutation random displacement to measure the importance of variables.Finally,based on random forest algorithms and decision trees for discrimination and classification.The experimental results show that the proposed method has certain accuracy.Compared with the Adaboost method,the random forest method is better than the prediction accuracy or the average absolute error.
作者
王泽原
赵丽
胡俊
WANG Ze-yuan;ZHAO Li;HU Jun(School of Investigation and Terrorism,People’s Public Security of China,Beijing 100038;School of Software,Shanxi University,Taiyuan 030013;School of Computer Science,Beijing University of Aeronautics and Astronautics,Beijing 100191 China)
出处
《湘潭大学学报(自然科学版)》
CAS
2018年第6期115-120,共6页
Journal of Xiangtan University(Natural Science Edition)
基金
山西省科技厅基础研究计划项目(2014021039-6)
关键词
贫困生认定
大数据
随机森林算法
决策树
数据清洗
poor student identification
big data
random forest algorithm
decision tree
data cleaning