Kim Soeun, Sugar Catherine A, Belin Thomas R
Department of Biostatistics, School of Public Health, University of Texas Health Science Center, Houston, Texas, 77030, U.S.A.
Stat Med. 2015 May 20;34(11):1876-88. doi: 10.1002/sim.6435. Epub 2015 Jan 29.
Imputation strategies are widely used in settings that involve inference with incomplete data. However, implementation of a particular approach always rests on assumptions, and subtle distinctions between methods can have an impact on subsequent analyses. In this research article, we are concerned with regression models in which the true underlying relationship includes interaction terms. We focus in particular on a linear model with one fully observed continuous predictor, a second partially observed continuous predictor, and their interaction. We derive the conditional distribution of the missing covariate and interaction term given the observed covariate and the outcome variable, and examine the performance of a multiple imputation procedure based on this distribution. We also investigate several alternative procedures that can be implemented by adapting multivariate normal multiple imputation software in ways that might be expected to perform well despite incompatibilities between model assumptions and true underlying relationships among the variables. The methods are compared in terms of bias, coverage, and CI width. As expected, the procedure based on the correct conditional distribution performs well across all scenarios. Just as importantly for general practitioners, several of the approaches based on multivariate normality perform comparably with the correct conditional distribution in a number of circumstances, although interestingly, procedures that seek to preserve the multiplicative relationship between the interaction term and the main-effects are found to be substantially less reliable. For illustration, the various procedures are applied to an analysis of post-traumatic stress disorder symptoms in a study of childhood trauma.
插补策略广泛应用于涉及不完整数据推断的场景中。然而,特定方法的实施总是基于假设,并且方法之间的细微差别可能会对后续分析产生影响。在这篇研究文章中,我们关注的是真实潜在关系包含交互项的回归模型。我们特别关注一个线性模型,该模型有一个完全观测到的连续预测变量、一个部分观测到的连续预测变量以及它们的交互项。我们推导了给定观测到的协变量和结果变量时缺失协变量和交互项的条件分布,并检验基于此分布的多重插补程序的性能。我们还研究了几种替代程序,这些程序可以通过以预期能良好运行的方式改编多元正态多重插补软件来实现,尽管模型假设与变量之间的真实潜在关系不兼容。我们根据偏差、覆盖率和置信区间宽度对这些方法进行了比较。正如预期的那样,基于正确条件分布的程序在所有场景中都表现良好。对普通从业者同样重要的是,一些基于多元正态性的方法在许多情况下与正确的条件分布表现相当,尽管有趣的是,试图保留交互项与主效应之间乘法关系的程序被发现可靠性要低得多。为了说明,我们将各种程序应用于一项儿童创伤研究中创伤后应激障碍症状的分析。