评估存在缺失数据时现有模型的预测模型性能。

Evaluation of predictive model performance of an existing model in the presence of missing data.

机构信息

Department of Biostatistics, University of Michigan, Ann Arbor, Michigan, USA.

Department of Radiation Oncology, University of Michigan, Ann Arbor, Michigan, USA.

出版信息

Stat Med. 2021 Jul 10;40(15):3477-3498. doi: 10.1002/sim.8978. Epub 2021 Apr 11.

DOI:10.1002/sim.8978

PMID:33843085

原文链接:https://pmc.ncbi.nlm.nih.gov/articles/PMC8985431/

Abstract

In medical research, the Brier score (BS) and the area under the receiver operating characteristic (ROC) curves (AUC) are two common metrics used to evaluate prediction models of a binary outcome, such as using biomarkers to predict the risk of developing a disease in the future. The assessment of an existing prediction models using data with missing covariate values is challenging. In this article, we propose inverse probability weighted (IPW) and augmented inverse probability weighted (AIPW) estimates of AUC and BS to handle the missing data. An alternative approach uses multiple imputation (MI), which requires a model for the distribution of the missing variable. We evaluated the performance of IPW and AIPW in comparison with MI in simulation studies under missing completely at random, missing at random, and missing not at random scenarios. When there are missing observations in the data, MI and IPW can be used to obtain unbiased estimates of BS and AUC if the imputation model for the missing variable or the model for the missingness is correctly specified. MI is more efficient than IPW. Our simulation results suggest that AIPW can be more efficient than IPW, and also achieves double robustness from miss-specification of either the missingness model or the imputation model. The outcome variable should be included in the model for the missing variable under all scenarios, while it only needs to be included in missingness model if the missingness depends on the outcome. We illustrate these methods using an example from prostate cancer.

摘要

在医学研究中，Brier 评分（BS）和受试者工作特征（ROC）曲线下面积（AUC）是两种常用的评估二分类结局预测模型的指标，例如使用生物标志物来预测未来患某种疾病的风险。评估使用具有缺失协变量值的数据的现有预测模型具有挑战性。在本文中，我们提出了使用Inverse Probability Weighted (IPW) 和 Augmented Inverse Probability Weighted (AIPW) 来处理缺失数据的 AUC 和 BS 估计值。另一种方法是使用多重插补（MI），它需要缺失变量分布的模型。我们在完全随机缺失、随机缺失和非随机缺失的情况下进行了模拟研究，比较了 IPW 和 AIPW 与 MI 的性能。当数据中存在缺失观测值时，如果缺失变量的插补模型或缺失模型正确指定，则 MI 和 IPW 可用于获得 BS 和 AUC 的无偏估计值。MI 比 IPW 更有效。我们的模拟结果表明，AIPW 比 IPW 更有效，并且还可以从缺失模型或插补模型的错误指定中实现双重稳健性。在所有情况下，缺失变量的模型都应该包含结局变量，而仅当缺失依赖于结局时，缺失模型才需要包含结局变量。我们使用前列腺癌的一个例子来说明这些方法。

Suppr 超能文献

文献检索

文件翻译

深度研究

Suppr 超能文献

文献检索

文件翻译

深度研究

评估存在缺失数据时现有模型的预测模型性能。

Evaluation of predictive model performance of an existing model in the presence of missing data.

机构信息

出版信息

相似文献

引用本文的文献

本文引用的文献

评估存在缺失数据时现有模型的预测模型性能。

Evaluation of predictive model performance of an existing model in the presence of missing data.

机构信息

出版信息

相似文献

引用本文的文献

本文引用的文献