摘要
随着大数据和网络的不断发展,网络调查越来越广泛,大部分网络调查样本属于非概率样本,难以采用传统的抽样推断理论进行推断,如何解决网络调查样本的推断问题是大数据背景下网络调查发展的迫切需求。本文首次从建模的角度提出了解决该问题的基本思路:一是入样概率的建模推断,可以考虑构建基于机器学习与变量选择的倾向得分模型来估计入样概率推断总体;二是目标变量的建模推断,可以考虑直接对目标变量建立参数、非参数或半参数超总体模型进行估计;三是入样概率与目标变量的双重建模推断,可以考虑进行倾向得分模型与超总体模型的加权估计与混合推断。最后,以基于广义Boosted模型的入样概率建模推断为例演示了具体解决方法。
With the development of big data and internet,web surveys are becoming more and more extensive.However,most of web survey samples belong to non-probability samples.It is difficult to apply the traditional inference theory of probability sampling to web survey samples.Therefore,how to solve inference problems of web survey samples is the urgent need for the development of web surveys in the context of big data.The research proposes some basic ideas to solve this problem from the perspective of modeling for the first time.First,inclusion probabilities can be estimated via modeling for inference.That is,propensity score models based on machine learning and variable selection can be constructed to estimate inclusion probabilities.Second,target variables can be estimated via modeling for inference.It can be considered to establish parametric,non-parametric or semi-parametric superpopulation models of target variables for estimating the population.Third,both inclusion probabilities and target variables can be estimated via modeling for inference.The weighted estimation and hybrid inference of propensity score models and superpopulation models can be considered.Finally,the modeling inference method of inclusion probabilities based on generalized boosted model is taken as an example to discuss concrete solutions to the modeling inference problem of web survey samples.
作者
刘展
潘莹丽
Liu Zhan;Pan Yingli
出处
《统计研究》
CSSCI
北大核心
2019年第9期93-103,共11页
Statistical Research
基金
国家社会科学基金一般项目“大数据背景下网络调查样本的模型推断研究”(18BTJ022)的资助
关键词
大数据
网络调查样本
入样概率
目标变量
建模推断
Big Data
Web Survey Samples
Inclusion Probability
Target Variables
Modeling Inference