MRC Biostatistics Unit, Institute of Public Health, Cambridge CB2 0SR, UK.
BMC Med Res Methodol. 2012 Apr 10;12:46. doi: 10.1186/1471-2288-12-46.
Multiple imputation is often used for missing data. When a model contains as covariates more than one function of a variable, it is not obvious how best to impute missing values in these covariates. Consider a regression with outcome Y and covariates X and X2. In 'passive imputation' a value X* is imputed for X and then X2 is imputed as (X*)2. A recent proposal is to treat X2 as 'just another variable' (JAV) and impute X and X2 under multivariate normality.
We use simulation to investigate the performance of three methods that can easily be implemented in standard software: 1) linear regression of X on Y to impute X then passive imputation of X2; 2) the same regression but with predictive mean matching (PMM); and 3) JAV. We also investigate the performance of analogous methods when the analysis involves an interaction, and study the theoretical properties of JAV. The application of the methods when complete or incomplete confounders are also present is illustrated using data from the EPIC Study.
JAV gives consistent estimation when the analysis is linear regression with a quadratic or interaction term and X is missing completely at random. When X is missing at random, JAV may be biased, but this bias is generally less than for passive imputation and PMM. Coverage for JAV was usually good when bias was small. However, in some scenarios with a more pronounced quadratic effect, bias was large and coverage poor. When the analysis was logistic regression, JAV's performance was sometimes very poor. PMM generally improved on passive imputation, in terms of bias and coverage, but did not eliminate the bias.
Given the current state of available software, JAV is the best of a set of imperfect imputation methods for linear regression with a quadratic or interaction effect, but should not be used for logistic regression.
缺失数据通常采用多重插补法处理。如果模型中的协变量包含一个以上变量的函数,那么最佳的缺失值插补方法并不明确。考虑一个包含因变量 Y 和协变量 X 及 X2 的回归模型。在“被动插补”中,先对 X 进行插补,然后将 X2 插补为(X*)2。最近有一项提议是将 X2 视为“另一个变量”(JAV),并在多元正态分布下对 X 和 X2 进行插补。
我们采用模拟方法研究了三种易于在标准软件中实现的方法的性能:1)Y 对 X 的线性回归以插补 X,然后被动插补 X2;2)相同的回归,但采用预测均值匹配(PMM);3)JAV。我们还研究了分析中涉及交互作用时这些方法的性能,并研究了 JAV 的理论性质。当存在完全或不完全混杂因素时,还使用 EPIC 研究的数据来演示这些方法的应用。
当分析是带有二次项或交互项的线性回归且 X 完全随机缺失时,JAV 能给出一致的估计。当 X 随机缺失时,JAV 可能存在偏差,但这种偏差通常小于被动插补和 PMM。当偏差较小时,JAV 的覆盖率通常较好。然而,在二次效应更明显的某些情况下,偏差较大,覆盖率较低。当分析是逻辑回归时,JAV 的性能有时非常差。PMM 在偏倚和覆盖率方面通常优于被动插补,但不能消除偏倚。
鉴于当前可用软件的状况,JAV 是具有二次项或交互项的线性回归模型中一组不完善的插补方法中最好的,但不应用于逻辑回归。