Department of Biostatistics, Gillings School of Public Health, University of North Carolina at Chapel Hill, Chapel Hill, NC, USA.
Biom J. 2022 Jun;64(5):858-862. doi: 10.1002/bimj.202100250. Epub 2022 Feb 24.
Missing data are often overcome using imputation, which leverages the entire dataset to replace missing values with informed placeholders. This method can be modified for censored data by also incorporating partial information from censored values. One such modification proposed by Atem et al. (2017, 2019a, 2019b) is conditional mean imputation where censored covariates are replaced by their conditional means given other fully observed information. These methods are robust to additional parametric assumptions on the censored covariate and utilize all available data, which is appealing. However, in implementing these methods, we discovered that these three articles provide nonequivalent formulas and, in fact, none is the correct formula for the conditional mean. Herein, we derive the correct form of the conditional mean and discuss the bias incurred when using the incorrect formulas. Furthermore, we note that even the correct formula can perform poorly for log hazard ratios far from . We also provide user-friendly R software, the imputeCensoRd package, to enable future researchers to tackle censored covariates correctly.
缺失数据通常可以通过插补来解决,该方法利用整个数据集,用有根据的占位符替换缺失值。这种方法可以通过结合来自删失值的部分信息来修改,以适应删失数据。Atem 等人(2017 年、2019a 年、2019b 年)提出的一种此类修改方法是条件均值插补,其中删失协变量用其他完全观察到的信息的条件均值替换。这些方法对删失协变量的附加参数假设具有鲁棒性,并利用所有可用数据,这很有吸引力。然而,在实施这些方法时,我们发现这三篇文章提供的公式并不等价,实际上,没有一个公式是条件均值的正确公式。在此,我们推导出条件均值的正确形式,并讨论使用不正确公式时产生的偏差。此外,我们还注意到,即使是正确的公式,对于远离 的对数风险比也可能表现不佳。我们还提供了用户友好的 R 软件,即 imputeCensoRd 包,以便未来的研究人员能够正确处理删失协变量。