摘要
目前利用统计机器学习方法对国家高新区发展绩效进行预测的研究尚不多见。利用公开发布的2008年至2012年国家高新区统计数据,基于随机森林与梯度提升两种决策树集成学习算法,构建用于预测国家高新区发展绩效的统计机器学习模型。研究表明:随机森林与梯度提升决策树模型均表现优异,决定系数分别达到0.950和0.960,其中梯度提升决策树的平均绝对误差和误差均方根值皆小于随机森林,泛化性能略胜一筹。根据特征重要性结果,产品销售收入、总收入、工业总产值和年末资产对国家高新区发展绩效的预测能力最强。
Currently,there are few studies which use the method of statistical machine learning approaches to predict the development performance of national high-tech zones.Therefore,in this study,on the basis of the statistical data of national hightech zones from 2008 to 2012,a statistical machine learning model for predicting the development performance of the national high-tech zone is constructed based on the two decision tree integrated learning algorithms of random forest and gradient boosting decision tree.The results show that Random Forest(RF)and Gradient Boosting Decision Tree(GBDT)perform well,whose coefficient of determination are 0.950 and 0.960.Besides,mean absolute error and root of mean square error of GBDT are lower than RF,which means in comparison with RF,GBDT has more powerful generalization ability.According to the result of feature importance,product sales income,total income,total industrial output value and year end assets have the strongest ability to predict the development performance of national high-tech zones.
作者
遆俐君
杨潇坤
吴瑶
TI Lijun;YANG Xiaokun;WU Yao(Department of Social Science and Policy,Institute of Education,University College London,London,WCE16BT,UK;Yiban Development Center,Lanzhou University,Lanzhou 730030,China;Center for Studies of Ethnic Minorities in Northwest China,Lanzhou University,Lanzhou 730030,China)
出处
《北部湾大学学报》
2020年第8期35-41,共7页
Journal of BeiBu Gulf University
关键词
国家高新区
发展绩效
集成学习
随机森林
决策树
national high-tech zones
development performance
ensemble learning
RF
GBDT