Am J Epidemiol. 2020 Dec 1;189(12):1628-1632. doi: 10.1093/aje/kwaa153.
In observational studies using routinely collected data, a variable with a high level of missingness or misclassification may determine whether an observation is included in the analysis. In settings where inclusion criteria are assessed after imputation, the popular multiple-imputation variance estimator proposed by Rubin ("Rubin's rules" (RR)) is biased due to incompatibility between imputation and analysis models. While alternative approaches exist, most analysts are not familiar with them. Using partially validated data from a human immunodeficiency virus cohort, we illustrate the calculation of an imputation variance estimator proposed by Robins and Wang (RW) in a scenario where the study exclusion criteria are based on a variable that must be imputed. In this motivating example, the corresponding imputation variance estimate for the log odds was 29% smaller using the RW estimator than using the RR estimator. We further compared these 2 variance estimators with a simulation study which showed that coverage probabilities of 95% confidence intervals based on the RR estimator were too high and became worse as more observations were imputed and more subjects were excluded from the analysis. The RW imputation variance estimator performed much better and should be employed when there is incompatibility between imputation and analysis models. We provide analysis code to aid future analysts in implementing this method.
在使用常规收集数据进行观察性研究中,缺失值或分类错误率较高的变量可能会决定观察结果是否纳入分析。在使用插补后评估纳入标准的情况下,由于插补和分析模型之间不兼容,Rubin 提出的流行的多重插补方差估计量(“Rubin 规则”(RR))会产生偏差。虽然存在替代方法,但大多数分析师并不熟悉它们。我们使用人类免疫缺陷病毒队列的部分验证数据,说明了 Robins 和 Wang(RW)提出的插补方差估计量在研究排除标准基于必须插补的变量的情况下的计算。在这个示例中,使用 RW 估计量时,对数优势的相应插补方差估计值比 RR 估计量小 29%。我们进一步将这两种方差估计量与模拟研究进行了比较,结果表明基于 RR 估计量的 95%置信区间的覆盖率概率过高,并且随着更多的观察值被插补以及更多的受试者被排除在分析之外,覆盖率概率变得更差。RW 插补方差估计量的性能要好得多,当插补和分析模型之间存在不兼容时,应使用该方法。我们提供了分析代码,以帮助未来的分析师实施这种方法。