Pfizer Worldwide Research and Development, Cambridge, Massachusetts.
Department of Biostatistics, University of Kentucky, Lexington, Kentucky.
Stat Med. 2020 Apr 15;39(8):1156-1166. doi: 10.1002/sim.8468. Epub 2020 Jan 29.
Multiple imputation by chained equations (MICE) has emerged as a leading strategy for imputing missing epidemiological data due to its ease of implementation and ability to maintain unbiased effect estimates and valid inference. Within the MICE algorithm, imputation can be performed using a variety of parametric or nonparametric methods. Literature has suggested that nonparametric tree-based imputation methods outperform parametric methods in terms of bias and coverage when there are interactions or other nonlinear effects among the variables. However, these studies fail to provide a fair comparison as they do not follow the well-established recommendation that any effects in the final analysis model (including interactions) should be included in the parametric imputation model. We show via simulation that properly incorporating interactions in the parametric imputation model leads to much better performance. In fact, correctly specified parametric imputation and tree-based random forest imputation perform similarly when estimating the interaction effect. Parametric imputation leads to slightly higher coverage for the interaction effect, but it has wider confidence intervals than random forest imputation and requires correct specification of the imputation model. Epidemiologists should take care in specifying MICE imputation models, and this paper assists in that task by providing a fair comparison of parametric and tree-based imputation in MICE.
多元插补的链方程方法(MICE)因其易于实施以及能够保持无偏效应估计和有效推断而成为一种主要的缺失流行病学数据插补策略。在 MICE 算法中,可以使用各种参数或非参数方法进行插补。文献表明,当变量之间存在交互作用或其他非线性效应时,基于树的非参数插补方法在偏差和覆盖范围方面优于参数方法。然而,这些研究未能提供公平的比较,因为它们没有遵循既定的建议,即在最终分析模型中(包括交互作用)的任何效应都应包含在参数插补模型中。我们通过模拟表明,在参数插补模型中正确纳入交互作用会导致更好的性能。实际上,在估计交互效应时,正确指定的参数插补和基于树的随机森林插补表现相似。参数插补导致交互效应的覆盖率略高,但置信区间比随机森林插补宽,并且需要正确指定插补模型。 流行病学家在指定 MICE 插补模型时应谨慎,本文通过在 MICE 中对参数和基于树的插补进行公平比较,为该任务提供了帮助。