Department of Biostatistics, Harvard T.H. Chan School of Public Health, Boston, Massachusetts.
Division of Intramural Population Health Research, Eunice Kennedy Shriver National Institute of Child Health and Human Development, Rockville, Maryland.
Am J Epidemiol. 2018 Mar 1;187(3):585-591. doi: 10.1093/aje/kwx350.
Missing data is a common occurrence in epidemiologic research. In this paper, 3 data sets with induced missing values from the Collaborative Perinatal Project, a multisite US study conducted from 1959 to 1974, are provided as examples of prototypical epidemiologic studies with missing data. Our goal was to estimate the association of maternal smoking behavior with spontaneous abortion while adjusting for numerous confounders. At the same time, we did not necessarily wish to evaluate the joint distribution among potentially unobserved covariates, which is seldom the subject of substantive scientific interest. The inverse probability weighting (IPW) approach preserves the semiparametric structure of the underlying model of substantive interest and clearly separates the model of substantive interest from the model used to account for the missing data. However, IPW often will not result in valid inference if the missing-data pattern is nonmonotone, even if the data are missing at random. We describe a recently proposed approach to modeling nonmonotone missing-data mechanisms under missingness at random to use in constructing the weights in IPW complete-case estimation, and we illustrate the approach using 3 data sets described in a companion article (Am J Epidemiol. 2018;187(3):568-575).
缺失数据在流行病学研究中很常见。本文提供了来自 1959 年至 1974 年进行的多地点美国合作围产期项目的 3 个具有诱导缺失值的数据组,作为具有缺失数据的典型流行病学研究的示例。我们的目标是在调整了许多混杂因素后,估计母亲吸烟行为与自然流产之间的关联。同时,我们不一定希望评估潜在未观察到的协变量之间的联合分布,这很少是实质性科学兴趣的主题。逆概率加权(IPW)方法保留了感兴趣的基础模型的半参数结构,并清楚地区分了感兴趣的模型与用于解释缺失数据的模型。但是,如果缺失数据模式是非单调的,即使数据是随机缺失的,IPW 通常也不会导致有效推断。我们描述了一种最近提出的方法,用于在随机缺失下对非单调缺失数据机制进行建模,以便在 IPW 完全案例估计中构建权重,并使用在一篇伴随文章中描述的 3 个数据集来说明该方法(Am J Epidemiol. 2018;187(3):568-575)。