Department of Medical Statistics, London School of Hygiene and Tropical Medicine, London, UK.
Department of Health Services Research and Policy, London School of Hygiene and Tropical Medicine, London, UK.
Biom J. 2020 Mar;62(2):428-443. doi: 10.1002/bimj.201900041. Epub 2020 Jan 29.
Missing data is a common issue in research using observational studies to investigate the effect of treatments on health outcomes. When missingness occurs only in the covariates, a simple approach is to use missing indicators to handle the partially observed covariates. The missing indicator approach has been criticized for giving biased results in outcome regression. However, recent papers have suggested that the missing indicator approach can provide unbiased results in propensity score analysis under certain assumptions. We consider assumptions under which the missing indicator approach can provide valid inferences, namely, (1) no unmeasured confounding within missingness patterns; either (2a) covariate values of patients with missing data were conditionally independent of treatment or (2b) these values were conditionally independent of outcome; and (3) the outcome model is correctly specified: specifically, the true outcome model does not include interactions between missing indicators and fully observed covariates. We prove that, under the assumptions above, the missing indicator approach with outcome regression can provide unbiased estimates of the average treatment effect. We use a simulation study to investigate the extent of bias in estimates of the treatment effect when the assumptions are violated and we illustrate our findings using data from electronic health records. In conclusion, the missing indicator approach can provide valid inferences for outcome regression, but the plausibility of its assumptions must first be considered carefully.
在使用观察性研究来调查治疗对健康结果的影响的研究中,缺失数据是一个常见的问题。当缺失仅发生在协变量中时,一种简单的方法是使用缺失指标来处理部分观察到的协变量。缺失指标方法因在结果回归中给出有偏结果而受到批评。然而,最近的论文表明,在某些假设下,缺失指标方法可以在倾向评分分析中提供无偏的结果。我们考虑了缺失指标方法可以提供有效推断的假设,即:(1)缺失模式中没有未测量的混杂;要么(2a)缺失数据患者的协变量值与治疗条件独立,要么(2b)这些值与结果条件独立;以及(3)结果模型正确指定:具体来说,真实的结果模型不包括缺失指标和完全观察到的协变量之间的交互作用。我们证明,在上述假设下,带有结果回归的缺失指标方法可以提供治疗效果的无偏估计。我们使用模拟研究来研究违反假设时治疗效果估计的偏差程度,并使用电子健康记录中的数据说明我们的发现。总之,缺失指标方法可以为结果回归提供有效推断,但必须首先仔细考虑其假设的合理性。