摘要
大数据背景下,将受访者驱动抽样(RDS)用于网络抽样调查,解决了传统抽样调查难以获得可用抽样框、难以接触被调查者以及难以获得回答等问题,也使得网络调查可以实现概率抽样,得到一定误差范围内的总体参数估计.然而,在实际抽样过程中,同质性问题(即样本单元在推荐同伴时倾向于推荐那些与自己有相同属性的同伴)会导致RDS估计量的方差增大.为解决该问题,文章假定目标总体服从度修正随机块模型(DCSBM),利用区块间的经验转移概率对样本进行区块的事后分层,提出了事后分层与逆概率加权相结合的PS-IPW估计量.通过模拟不同的同质性水平的目标总体社交网络和RDS抽样,比较PS-IPW估计量的相对效率;并通过实证分析,利用样本分块矩阵的谱性质选择分层变量,进一步验证RDS抽样的适用性以及PS-IPW估计量的有效性.
In Big Data era,Respondent-Driven Sampling(RDS) is more often applied network sampling with general population.Such optimization offers a possible solution for problems in traditional sampling investigation,including the difficulties to obtains usable sampling frames,respondents or the responds themselves.Moreover,it also enables network survey to be probabilistic and obtain overall parameter estimation within a certain error range.However,homogeneity in statistical research always deviates RDS estimating result(when recommending a companion for the research,the respondent is more likely to introduce someone with whom he/she shares similar qualities).In order to offer a practical solution,this paper assumes that population obeys the Degree-Corrected Stochastic Block Models(DCSBM).We post-stratify the sample based on transition probability and propose an inverse probability weighted PS-IPW estimator.By simulation analysis,we compare the relative efficiency between different network population with varied homogeneity.By empirical study,we sort out stratifies variables of our sample based on the characteristics of spectral of block matrix,which further verifies the usability of RDS sampling and the efficiency of PS-IPW estimator.
作者
蒋妍
孟珠峰
王天佳
刘晓宇
JIANG Yan;MENG Zhufeng;WANG Tianjia;LIU Xiaoyu(Center for Applied Statistics,Renmin University of China,Beijing 100872;School of Statistics,Renmin University of hina,Beijing 100872;Institute of Survey Technology,Renmin University of China,Beijing 100872)
出处
《系统科学与数学》
CSCD
北大核心
2022年第1期85-99,共15页
Journal of Systems Science and Mathematical Sciences
基金
教育部哲学社会科学研究重大课题攻关项目(20JZD023)资助课题。