Bartlett Jonathan W, Seaman Shaun R, White Ian R, Carpenter James R
Department of Medical Statistics, London School of Hygiene & Tropical Medicine, UK
MRC Biostatistics Unit, Cambridge, UK.
Stat Methods Med Res. 2015 Aug;24(4):462-87. doi: 10.1177/0962280214521348. Epub 2014 Feb 12.
Missing covariate data commonly occur in epidemiological and clinical research, and are often dealt with using multiple imputation. Imputation of partially observed covariates is complicated if the substantive model is non-linear (e.g. Cox proportional hazards model), or contains non-linear (e.g. squared) or interaction terms, and standard software implementations of multiple imputation may impute covariates from models that are incompatible with such substantive models. We show how imputation by fully conditional specification, a popular approach for performing multiple imputation, can be modified so that covariates are imputed from models which are compatible with the substantive model. We investigate through simulation the performance of this proposal, and compare it with existing approaches. Simulation results suggest our proposal gives consistent estimates for a range of common substantive models, including models which contain non-linear covariate effects or interactions, provided data are missing at random and the assumed imputation models are correctly specified and mutually compatible. Stata software implementing the approach is freely available.
缺失协变量数据在流行病学和临床研究中普遍存在,通常采用多重填补法进行处理。如果实质模型是非线性的(如Cox比例风险模型),或者包含非线性(如平方)或交互项,那么对部分观测协变量的填补就会变得复杂,并且多重填补的标准软件实现可能会从不兼容此类实质模型的模型中对协变量进行填补。我们展示了如何修改通过完全条件设定进行填补(一种执行多重填补的常用方法),以便从与实质模型兼容的模型中对协变量进行填补。我们通过模拟研究了该方法的性能,并将其与现有方法进行比较。模拟结果表明,我们的方法对于一系列常见的实质模型能给出一致的估计,包括包含非线性协变量效应或交互项的模型,前提是数据随机缺失且假定的填补模型被正确设定且相互兼容。实现该方法的Stata软件可免费获取。