针对具有缺失结果的概率回归模型的缺失数据处理方法及其应用

Missing data approaches for probability regression models with missing outcomes with applications.

作者信息

Qi Li, Sun Yanqing

出版信息

J Stat Distrib Appl. 2014;1. doi: 10.1186/s40488-014-0023-3. Epub 2014 Dec 16.

DOI:10.1186/s40488-014-0023-3

PMID:26900543

原文链接:https://pmc.ncbi.nlm.nih.gov/articles/PMC4757472/

Abstract

In this paper, we investigate several well known approaches for missing data and their relationships for the parametric probability regression model (|) when outcome of interest is subject to missingness. We explore the relationships between the mean score method, the inverse probability weighting (IPW) method and the augmented inverse probability weighted (AIPW) method with some interesting findings. The asymptotic distributions of the IPW and AIPW estimators are derived and their efficiencies are compared. Our analysis details how efficiency may be gained from the AIPW estimator over the IPW estimator through estimation of validation probability and augmentation. We show that the AIPW estimator that is based on augmentation using the full set of observed variables is more efficient than the AIPW estimator that is based on augmentation using a subset of observed variables. The developed approaches are applied to Poisson regression model with missing outcomes based on auxiliary outcomes and a validated sample for true outcomes. We show that, by stratifying based on a set of discrete variables, the proposed statistical procedure can be formulated to analyze automated records that only contain summarized information at categorical levels. The proposed methods are applied to analyze influenza vaccine efficacy for an influenza vaccine study conducted in Temple-Belton, Texas during the 2000-2001 influenza season.

摘要

在本文中，我们研究了几种针对缺失数据的著名方法，以及当感兴趣的结果存在缺失时，这些方法对于参数概率回归模型 (|) 的关系。我们探讨了均值评分法、逆概率加权 (IPW) 法和增强逆概率加权 (AIPW) 法之间的关系，并得出了一些有趣的发现。推导了IPW和AIPW估计量的渐近分布，并比较了它们的效率。我们的分析详细说明了通过估计验证概率和增强，AIPW估计量相对于IPW估计量如何提高效率。我们表明，基于使用完整观测变量集进行增强的AIPW估计量比基于使用观测变量子集进行增强的AIPW估计量更有效。所开发的方法应用于基于辅助结果和真实结果的验证样本的具有缺失结果的泊松回归模型。我们表明，通过基于一组离散变量进行分层，可以制定所提出的统计程序来分析仅包含分类级别汇总信息的自动记录。所提出的方法应用于分析2000 - 2001年流感季节在得克萨斯州坦普尔 - 贝尔顿进行的一项流感疫苗研究中的流感疫苗效力。