摘要
数据缺失是实际数据分析中一个常见的问题.文章将逆概率加权方法与插补方法结合,提出了一种Mallows模型平均方法以处理数据缺失问题,并证明了该方法得到的估计量在实现最小平方误差的意义下能渐近地达到最优.相比于传统的逆概率加权方法,文章的方法不仅可以充分利用观测信息,并且能够应用于非随机缺失的情形.相比于完全基于插补的方法,文章的方法继承了插补方法的一些优势,同时能够避免因错误地插补较大的数据块而产生的偏差.通过数值模拟,首先验证了三种简单的插补方法满足渐近最优性成立的条件,之后将文章提出的Mallows模型平均方法与已有的应用于缺失数据的模型平均方法进行比较,结果表明,所提出的新方法在大多数情况下优于已有的其它模型平均方法.最后,将新方法应用于平均寿命数据,实证结果进一步表明新方法较已有模型平均方法更为稳健.
Missing data is a common issue in real data analysis.In this paper,we combine the inverse probability weighting method with the imputation method and propose a Mallows model averaging method for missing data.We prove that the proposed method asymptotically achieves the lowest possible squared error.Compared with the traditional inverse probability weighting method,the proposed method can not only take full information provided by the training data but also be applied to data under missing not at random.Our method also inherits some advantages of the imputation method and avoids the bias caused by the erroneous imputation of large data blocks.Simulation results show that three common imputation methods satisfy the condition where the asymptotic optimality is established and the proposed method is superior to some existing model averaging methods applied to missing data.We also use the proposed method to life expectancy data.
作者
祝恒坤
张海丽
ZHU Hengkun;ZHANG Haili(School of Mathematical Sciences,Capital Normal University,Beijing 100048;Department of Statistics and Data Science,Southern University of Science and Technology,Shenzhen 51805)
出处
《系统科学与数学》
CSCD
北大核心
2022年第4期1032-1059,共28页
Journal of Systems Science and Mathematical Sciences
基金
国家自然科学基金(12031016,11971323)
首都师范大学交叉科学研究院和生物统计交叉学科资助课题。
关键词
非随机缺失
插补
逆概率加权
模型平均
渐近最优性
Missing not at random
imputation
inverse probability weighting
model averaging
asymptotic optimality