Zhang Zhiwei, Liu Wei, Zhang Bo, Tang Li, Zhang Jun
Division of Biostatistics, Office of Surveillance and Biometrics, Center for Devices and Radiological Health, Food and Drug Administration, Silver Spring, MD, USA
Department of Mathematics, Harbin Institute of Technology, Harbin, P.R. China.
Stat Methods Med Res. 2016 Oct;25(5):2053-2066. doi: 10.1177/0962280213513758. Epub 2013 Dec 5.
Causal inference in observational studies is frequently challenged by the occurrence of missing data, in addition to confounding. Motivated by the Consortium on Safe Labor, a large observational study of obstetric labor practice and birth outcomes, this article focuses on the problem of missing exposure information in a causal analysis of observational data. This problem can be approached from different angles (i.e. missing covariates and causal inference), and useful methods can be obtained by drawing upon the available techniques and insights in both areas. In this article, we describe and compare a collection of methods based on different modeling assumptions, under standard assumptions for missing data (i.e. missing-at-random and positivity) and for causal inference with complete data (i.e. no unmeasured confounding and another positivity assumption). These methods involve three models: one for treatment assignment, one for the dependence of outcome on treatment and covariates, and one for the missing data mechanism. In general, consistent estimation of causal quantities requires correct specification of at least two of the three models, although there may be some flexibility as to which two models need to be correct. Such flexibility is afforded by doubly robust estimators adapted from the missing covariates literature and the literature on causal inference with complete data, and by a newly developed triply robust estimator that is consistent if any two of the three models are correct. The methods are applied to the Consortium on Safe Labor data and compared in a simulation study mimicking the Consortium on Safe Labor.
在观察性研究中,因果推断除了受到混杂因素的影响外,还经常受到缺失数据出现的挑战。受安全分娩联盟(一项关于产科分娩实践和出生结局的大型观察性研究)的启发,本文聚焦于观察性数据因果分析中缺失暴露信息的问题。这个问题可以从不同角度来解决(即缺失协变量和因果推断),并且可以通过借鉴这两个领域现有的技术和见解来获得有用的方法。在本文中,我们在缺失数据的标准假设(即随机缺失和正性假设)以及完整数据因果推断的标准假设(即无未测量的混杂因素和另一个正性假设)下,描述并比较了基于不同建模假设的一系列方法。这些方法涉及三个模型:一个用于处理分配,一个用于结局对处理和协变量的依赖性,一个用于缺失数据机制。一般来说,因果量的一致估计需要正确设定这三个模型中的至少两个,尽管对于哪两个模型需要正确设定可能存在一些灵活性。这种灵活性由从缺失协变量文献和完整数据因果推断文献改编而来的双重稳健估计器,以及一种新开发的三重稳健估计器提供,后者在三个模型中的任何两个正确时都是一致的。这些方法应用于安全分娩联盟的数据,并在模拟安全分娩联盟的模拟研究中进行比较。