Seaman Shaun R, Vansteelandt Stijn
Medical Research Council Biostatistics Unit, University of Cambridge, Cambridge, UK.
Department of Applied Mathematics, Computer Science and Statistics, Ghent University, Gent, Belgium.
Stat Sci. 2018;33(2):184-197. doi: 10.1214/18-STS647.
Most methods for handling incomplete data can be broadly classified as inverse probability weighting (IPW) strategies or imputation strategies. The former model the occurrence of incomplete data; the latter, the distribution of the missing variables given observed variables in each missingness pattern. Imputation strategies are typically more efficient, but they can involve extrapolation, which is difficult to diagnose and can lead to large bias. Double robust (DR) methods combine the two approaches. They are typically more efficient than IPW and more robust to model misspecification than imputation. We give a formal introduction to DR estimation of the mean of a partially observed variable, before moving to more general incomplete-data scenarios. We review strategies to improve the performance of DR estimators under model misspecification, reveal connections between DR estimators for incomplete data and 'design-consistent' estimators used in sample surveys, and explain the value of double robustness when using flexible data-adaptive methods for IPW or imputation.
大多数处理不完全数据的方法大致可分为逆概率加权(IPW)策略或插补策略。前者对不完全数据的出现进行建模;后者则对每个缺失模式下给定观测变量的缺失变量分布进行建模。插补策略通常更有效,但可能涉及外推,这很难诊断且可能导致较大偏差。双重稳健(DR)方法结合了这两种方法。它们通常比IPW更有效,并且比插补对模型误设更稳健。在转向更一般的不完全数据场景之前,我们对部分观测变量均值的DR估计进行正式介绍。我们回顾了在模型误设情况下提高DR估计器性能的策略,揭示了不完全数据的DR估计器与样本调查中使用的“设计一致”估计器之间的联系,并解释了在使用灵活的数据自适应方法进行IPW或插补时双重稳健性的价值。