Department of Biostatistics, Brown University, Providence, Rhode Island, USA.
CAUSALab, Harvard T.H. Chan School of Public Health, Boston, Massachusetts, USA.
Biometrics. 2023 Sep;79(3):2382-2393. doi: 10.1111/biom.13796. Epub 2022 Nov 25.
We propose methods for estimating the area under the receiver operating characteristic (ROC) curve (AUC) of a prediction model in a target population that differs from the source population that provided the data used for original model development. If covariates that are associated with model performance, as measured by the AUC, have a different distribution in the source and target populations, then AUC estimators that only use data from the source population will not reflect model performance in the target population. Here, we provide identification results for the AUC in the target population when outcome and covariate data are available from the sample of the source population, but only covariate data are available from the sample of the target population. In this setting, we propose three estimators for the AUC in the target population and show that they are consistent and asymptotically normal. We evaluate the finite-sample performance of the estimators using simulations and use them to estimate the AUC in a nationally representative target population from the National Health and Nutrition Examination Survey for a lung cancer risk prediction model developed using source population data from the National Lung Screening Trial.
我们提出了在与提供原始模型开发数据的源人群不同的目标人群中估计预测模型的受试者工作特征 (ROC) 曲线下面积 (AUC) 的方法。如果与 AUC 衡量的模型性能相关的协变量在源人群和目标人群中的分布不同,那么仅使用源人群数据的 AUC 估计值将无法反映目标人群中的模型性能。在这里,当可以从源人群的样本中获得结局和协变量数据,但只能从目标人群的样本中获得协变量数据时,我们提供了目标人群中 AUC 的识别结果。在这种情况下,我们提出了三种用于目标人群 AUC 的估计方法,并证明它们是一致的和渐近正态的。我们使用模拟来评估估计器的有限样本性能,并使用它们来估计来自全国健康和营养检查调查的全国代表性目标人群的 AUC,该人群用于开发使用来自国家肺癌筛查试验的源人群数据的肺癌风险预测模型。