Mi Junhui, Tendulkar Rahul D, Sittenfeld Sarah M C, Patil Sujata, Zabor Emily C
Department of Quantitative Health Sciences, Cleveland Clinic Research, Cleveland, Ohio, USA.
Department of Radiation Oncology, Taussig Cancer Institute, Cleveland Clinic, Cleveland, Ohio, USA.
Stat Med. 2025 Aug;44(18-19):e70203. doi: 10.1002/sim.70203.
Methods to handle missing data have been extensively explored in the context of estimation and descriptive studies, with multiple imputation being the most widely used method in clinical research. However, in the context of clinical risk prediction models, where the goal is often to achieve high prediction accuracy and to make predictions for future patients, there are different considerations regarding the handling of missing covariate data. As a result, deterministic imputation is better suited to the setting of clinical risk prediction models, since the outcome is not included in the imputation model and the imputation method can be easily applied to future patients. In this paper, we provide a tutorial demonstrating how to conduct bootstrapping followed by deterministic imputation of missing covariate data to construct and internally validate the performance of a clinical risk prediction model in the presence of missing data. Simulation study results are provided to help guide when imputation may be appropriate in real-world applications.
在估计和描述性研究的背景下,处理缺失数据的方法已得到广泛探索,多重填补是临床研究中使用最广泛的方法。然而,在临床风险预测模型的背景下,其目标通常是实现高预测准确性并为未来患者进行预测,在处理协变量数据缺失方面有不同的考虑。因此,确定性填补更适合临床风险预测模型的设置,因为结果不包含在填补模型中,并且填补方法可以很容易地应用于未来患者。在本文中,我们提供了一个教程,展示了如何进行自抽样,然后对缺失的协变量数据进行确定性填补,以在存在缺失数据的情况下构建和内部验证临床风险预测模型的性能。提供了模拟研究结果,以帮助指导在实际应用中何时进行填补可能是合适的。