摘要
AIC信息准则,BIC信息准则和Cross-Validation (交叉验证法,简称CV)是统计学中模型选择和评价的重要工具。本文为研究对数工资与周平均时长、IQ得分、世界工作知识得分、受教育年限等11个协变量之间的关系,从无协变量的模型经每次引入一个协变量共产生了12个备选的线性模型。基于AIC,BIC,CV三个准则分别在全样本上选择最优模型。在训练集上选最优模型,在测试集上进行误差分析对三个准则进行评判。其中,三个准则对应的值越小越好。在全样本上,基于三个准则选择的模型均是11个协变量共存的线性模型。将全样本分为训练集和测试集,基于AIC,BIC和CV三个准则,经过1000次运算,选择11个协变量共存的模型的概率分别为100%,99%,100%。对对数工资的探究,最优线模型为11个协变量共存的模型,并且三个准则的表现无明显差异。
AIC information criterion, information criterion and Cross-Validation (cross-validation method, abbreviated as) are important tools for model selection and evaluation in statistics. This article is to study the relationship between logarithmic wages and 11 covariates such as average weekly length, IQ score, world work knowledge score, years of education, etc. A total of 12 covariates are generated from the model without covariates and one covariate is introduced each time. The optimal models were selected on the full sample based on the three criteria of AIC, BIC, and CV. The optimal model is selected on the training set, and error analysis is performed on the test set to judge the three criteria. For all three criteria, their corresponding values are as small as possible. In the full sample, the models selected based on the three criteria are all linear models with 11 covariates coexisting. The full sample is divided into training set and test set. Based on the three criteria of AIC, BIC and CV, after 1000 operations, the probability of selecting a model with 11 covariates coexisting is 100%, 99%, and 100%, respectively. In the exploration of logarithmic wages, the optimal line model is a model in which 11 covariates coexist, and there is no significant difference in the performance of the three criteria.
出处
《应用数学进展》
2021年第1期351-358,共8页
Advances in Applied Mathematics