Frangakis Constantine E, Rubin Donald B, An Ming-Wen, MacKenzie Ellen
Department of Biostatistics, Johns Hopkins University, Baltimore, Maryland 21205, USA.
Biometrics. 2007 Sep;63(3):641-9; discussion 650-62. doi: 10.1111/j.1541-0420.2007.00847_1.x.
We consider studies of cohorts of individuals after a critical event, such as an injury, with the following characteristics. First, the studies are designed to measure "input" variables, which describe the period before the critical event, and to characterize the distribution of the input variables in the cohort. Second, the studies are designed to measure "output" variables, primarily mortality after the critical event, and to characterize the predictive (conditional) distribution of mortality given the input variables in the cohort. Such studies often possess the complication that the input data are missing for those who die shortly after the critical event because the data collection takes place after the event. Standard methods of dealing with the missing inputs, such as imputation or weighting methods based on an assumption of ignorable missingness, are known to be generally invalid when the missingness of inputs is nonignorable, that is, when the distribution of the inputs is different between those who die and those who live. To address this issue, we propose a novel design that obtains and uses information on an additional key variable-a treatment or externally controlled variable, which if set at its "effective" level, could have prevented the death of those who died. We show that the new design can be used to draw valid inferences for the marginal distribution of inputs in the entire cohort, and for the conditional distribution of mortality given the inputs, also in the entire cohort, even under nonignorable missingness. The crucial framework that we use is principal stratification based on the potential outcomes, here mortality under both levels of treatment. We also show using illustrative preliminary injury data that our approach can reveal results that are more reasonable than the results of standard methods, in relatively dramatic ways. Thus, our approach suggests that the routine collection of data on variables that could be used as possible treatments in such studies of inputs and mortality should become common.
我们考虑对经历重大事件(如受伤)后的个体队列进行研究,这些研究具有以下特征。首先,研究旨在测量“输入”变量,这些变量描述重大事件之前的时期,并刻画队列中输入变量的分布。其次,研究旨在测量“输出”变量,主要是重大事件后的死亡率,并刻画给定队列中输入变量时死亡率的预测(条件)分布。此类研究常常存在这样的复杂情况:对于在重大事件后不久死亡的人,输入数据缺失,因为数据收集在事件发生之后进行。当输入数据的缺失不可忽略时,即当死亡者和存活者的输入分布不同时,已知处理缺失输入的标准方法(如基于可忽略缺失性假设的插补或加权方法)通常是无效的。为解决这个问题,我们提出一种新颖的设计,该设计获取并使用关于一个额外关键变量——一种治疗或外部控制变量——的信息,如果将其设定在“有效”水平,本可防止那些死亡者的死亡。我们表明,即使在不可忽略缺失的情况下,新设计也可用于对整个队列中输入的边际分布以及给定输入时死亡率的条件分布进行有效的推断。我们使用的关键框架是基于潜在结果的主分层,这里的潜在结果是两种治疗水平下的死亡率。我们还通过说明性的初步伤害数据表明,我们的方法能够以相对显著的方式揭示比标准方法的结果更合理的结果。因此,我们的方法表明,在此类输入和死亡率研究中,常规收集关于可作为可能治疗方法的变量的数据应变得普遍。