Clinical Epidemiology and Biostatistics Unit, Murdoch Children's Research Institute, Melbourne, Victoria, Australia.
Centre for Epidemiology and Biostatistics, Melbourne School of Population and Global Health, University of Melbourne, Melbourne, Victoria, Australia.
Am J Epidemiol. 2018 Dec 1;187(12):2705-2715. doi: 10.1093/aje/kwy173.
With incomplete data, the "missing at random" (MAR) assumption is widely understood to enable unbiased estimation with appropriate methods. While the need to assess the plausibility of MAR and to perform sensitivity analyses considering "missing not at random" (MNAR) scenarios has been emphasized, the practical difficulty of these tasks is rarely acknowledged. With multivariable missingness, what MAR means is difficult to grasp, and in many MNAR scenarios unbiased estimation is possible using methods commonly associated with MAR. Directed acyclic graphs (DAGs) have been proposed as an alternative framework for specifying practically accessible assumptions beyond the MAR-MNAR dichotomy. However, there is currently no general algorithm for deciding how to handle the missing data given a specific DAG. Here we construct "canonical" DAGs capturing typical missingness mechanisms in epidemiologic studies with incomplete data on exposure, outcome, and confounding factors. For each DAG, we determine whether common target parameters are "recoverable," meaning that they can be expressed as functions of the available data distribution and thus estimated consistently, or whether sensitivity analyses are necessary. We investigate the performance of available-case and multiple-imputation procedures. Using data from waves 1-3 of the Longitudinal Study of Australian Children (2004-2008), we illustrate how our findings can guide the treatment of missing data in point-exposure studies.
在数据不完整的情况下,人们普遍认为“随机缺失”(MAR)假设可以通过适当的方法实现无偏估计。虽然已经强调了需要评估 MAR 的合理性,并进行考虑“非随机缺失”(MNAR)情况的敏感性分析,但这些任务的实际难度很少得到承认。在多变量缺失的情况下,很难理解 MAR 的含义,并且在许多 MNAR 情况下,使用通常与 MAR 相关的方法可以进行无偏估计。有向无环图(DAG)已被提议作为一种替代框架,用于指定超越 MAR-MNAR 二分法的实际可访问假设。然而,目前没有一般的算法可以根据特定的 DAG 来决定如何处理缺失数据。在这里,我们构建了“规范”的 DAG,这些 DAG 捕获了在暴露、结局和混杂因素数据不完整的流行病学研究中常见的缺失机制。对于每个 DAG,我们确定常见的目标参数是否“可恢复”,这意味着它们可以表示为可用数据分布的函数,从而可以一致地估计,或者是否需要进行敏感性分析。我们研究了可用案例和多次插补程序的性能。使用来自澳大利亚儿童纵向研究(2004-2008 年)第 1-3 波的数据,我们说明了我们的发现如何指导点暴露研究中缺失数据的处理。