期刊文献+

病例对照二分类数据下逻辑回归模型的稳健半监督推断

Semi-supervised Inference for Case-control Binary Data under Possibly Mis-speci ed Logistic Models
下载PDF
导出
摘要 本文基于半监督推断方法,研究了标记数据来自病例对照抽样而逻辑回归模型不正确时相关目标参数的估计问题.在二分类任务中,常用病例对照抽样解决数据结构不平衡的问题,常用逻辑回归模型作为统计模型.但在现实应用中,模型假设往往是错误的.若逻辑回归模型错误,仅利用病例对照抽样获得的标记数据无法对病例比例进行识别进而无法对目标参数,即使得总体风险达到最小值的参数进行估计.本文借助于半监督推断方法,首先利用标记数据和无标记数据得到病例比例的无偏估计,然后基于该估计,构造逆概率加权的损失函数来纠正病例对照数据中的抽样偏差.本文证明了求解以上的损失函数得到的解是关于目标参数的相合且渐近正态的估计,并且其极限分布的方差也可以通过观察到的数据进行一致地估计.同时,模拟研究的结果表明论文提出的方法能对目标参数给出相合的估计. Semi-supervised data contains a labeled data set with both responses and covariates and an unlabeled data set with covariates only.The inference based on semi-supervised data is gaining more and more interests in statistics.When the response in the labeled data is binary,case-control sampling is commonly used to alleviate the imbalanced data structure.When the response and the covariates satisfy the logistic model,the slope parameter of the model can be consistently estimated even for the case-control sampling.However,when the logistic model is incorrectly specified for the data,the case-control samples can not estimate the population risk minimizer consistently.With the help of the unlabeled data,we derive a consistent estimator for the case population proportion.Then,an inverse probability weighted loss function is developed to obtain a consistent estimator for the population risk minimizer.The proposed estimators are shown to be asymptotically normal and the limiting variance-covariance matrix can be consistently estimated.Simulation results show that the proposed method gives out reasonablefinite sample performances.A real data example is also analyzed for illustration.
作者 全卓君 郑明 郁文 QUAN Zhuojun;ZHENG Ming;YU Wen(Department of Statistics and Data Science,School of Management,Fudan University,Shanghai,200433,China)
出处 《应用概率统计》 CSCD 北大核心 2023年第5期730-746,共17页 Chinese Journal of Applied Probability and Statistics
基金 supported by National Natural Science Foundation of China (Grant Nos. 12271106,12071088)。
关键词 病例对照抽样 不平衡数据结构 逆概率加权 模型假定错误 总体风险极小值 半监督推断 case-control sampling imbalanced structure inverse probability weighting model mis-specification population risk minimizer semi-supervised inference
  • 相关文献

相关作者

内容加载中请稍等...

相关机构

内容加载中请稍等...

相关主题

内容加载中请稍等...

浏览历史

内容加载中请稍等...
;
使用帮助 返回顶部