Morris Tim P, White Ian R, Royston Patrick
Hub for Trials Methodology Research, MRC Clinical Trials Unit at UCL, Aviation House, 125 Kingsway, WC2B 6NH, London, UK.
BMC Med Res Methodol. 2014 Jun 5;14:75. doi: 10.1186/1471-2288-14-75.
Multiple imputation is a commonly used method for handling incomplete covariates as it can provide valid inference when data are missing at random. This depends on being able to correctly specify the parametric model used to impute missing values, which may be difficult in many realistic settings. Imputation by predictive mean matching (PMM) borrows an observed value from a donor with a similar predictive mean; imputation by local residual draws (LRD) instead borrows the donor's residual. Both methods relax some assumptions of parametric imputation, promising greater robustness when the imputation model is misspecified.
We review development of PMM and LRD and outline the various forms available, and aim to clarify some choices about how and when they should be used. We compare performance to fully parametric imputation in simulation studies, first when the imputation model is correctly specified and then when it is misspecified.
In using PMM or LRD we strongly caution against using a single donor, the default value in some implementations, and instead advocate sampling from a pool of around 10 donors. We also clarify which matching metric is best. Among the current MI software there are several poor implementations.
PMM and LRD may have a role for imputing covariates (i) which are not strongly associated with outcome, and (ii) when the imputation model is thought to be slightly but not grossly misspecified. Researchers should spend efforts on specifying the imputation model correctly, rather than expecting predictive mean matching or local residual draws to do the work.
多重填补是处理不完全协变量的常用方法,因为当数据随机缺失时,它可以提供有效的推断。这取决于能否正确指定用于填补缺失值的参数模型,而在许多实际情况下这可能很困难。预测均值匹配(PMM)填补法从具有相似预测均值的捐赠者那里借用一个观察值;局部残差抽取(LRD)填补法则借用捐赠者的残差。这两种方法都放宽了参数填补的一些假设,当填补模型指定错误时有望具有更高的稳健性。
我们回顾了PMM和LRD的发展情况,概述了可用的各种形式,旨在阐明关于如何以及何时使用它们的一些选择。我们在模拟研究中比较了与完全参数填补的性能,首先是在填补模型正确指定时,然后是在模型指定错误时。
在使用PMM或LRD时,我们强烈告诫不要使用单个捐赠者(这是一些实现中的默认值),而是主张从大约10个捐赠者的池中进行抽样。我们还阐明了哪种匹配度量是最佳的。在当前的多重填补软件中,有几个实现得很差。
PMM和LRD在填补(i)与结局没有强关联的协变量,以及(ii)当认为填补模型略有但并非严重指定错误时,可能会发挥作用。研究人员应该花力气正确指定填补模型,而不是期望预测均值匹配或局部残差抽取能完成这项工作。