Sullivan Thomas R, Salter Amy B, Ryan Philip, Lee Katherine J
Am J Epidemiol. 2015 Sep 15;182(6):528-34. doi: 10.1093/aje/kwv100. Epub 2015 Sep 2.
Multiple imputation (MI) is increasingly being used to handle missing data in epidemiologic research. When data on both the exposure and the outcome are missing, an alternative to standard MI is the "multiple imputation, then deletion" (MID) method, which involves deleting imputed outcomes prior to analysis. While MID has been shown to provide efficiency gains over standard MI when analysis and imputation models are the same, the performance of MID in the presence of auxiliary variables for the incomplete outcome is not well understood. Using simulated data, we evaluated the performance of standard MI and MID in regression settings where data were missing on both the outcome and the exposure and where an auxiliary variable associated with the incomplete outcome was included in the imputation model. When the auxiliary variable was unrelated to missingness in the outcome, both standard MI and MID produced negligible bias when estimating regression parameters, with standard MI being more efficient in most settings. However, when the auxiliary variable was also associated with missingness in the outcome, alarmingly MID produced markedly biased parameter estimates. On the basis of these results, we recommend that researchers use standard MI rather than MID in the presence of auxiliary variables associated with an incomplete outcome.
多重填补(MI)在流行病学研究中越来越多地用于处理缺失数据。当暴露因素和结局的数据均缺失时,标准MI的一种替代方法是“多重填补,然后删除”(MID)方法,该方法涉及在分析之前删除填补的结局。虽然当分析模型和填补模型相同时,MID已被证明比标准MI更有效,但在存在用于不完整结局的辅助变量的情况下,MID的性能尚不清楚。我们使用模拟数据评估了标准MI和MID在回归设置中的性能,在这些设置中,结局和暴露因素的数据均缺失,并且在填补模型中包含了与不完整结局相关的辅助变量。当辅助变量与结局中的缺失无关时,在估计回归参数时,标准MI和MID产生的偏差均可忽略不计,在大多数情况下标准MI更有效。然而,当辅助变量也与结局中的缺失相关时,令人担忧的是,MID产生了明显有偏差的参数估计。基于这些结果,我们建议研究人员在存在与不完整结局相关的辅助变量时使用标准MI而不是MID。