预测模型的内部和外部验证：小样本中偏差和精度的模拟研究

Internal and external validation of predictive models: a simulation study of bias and precision in small samples.

作者信息

Steyerberg Ewout W, Bleeker Sacha E, Moll Henriëtte A, Grobbee Diederick E, Moons Karel G M

机构信息

Center for Clinical Decision Sciences, Department of Public Health, Erasmus MC, PO Box 1738 3000 DR, Rotterdam, The Netherlands.

出版信息

J Clin Epidemiol. 2003 May;56(5):441-7. doi: 10.1016/s0895-4356(03)00047-7.

DOI:10.1016/s0895-4356(03)00047-7

PMID:12812818

Abstract

We performed a simulation study to investigate the accuracy of bootstrap estimates of optimism (internal validation) and the precision of performance estimates in independent validation samples (external validation). We combined two data sets containing children presenting with fever without source (n=376+179=555; 120 bacterial infections). Random samples were drawn from this combined data set for the development (n=376) and validation (n=179) of logistic regression models. The models included statistically significant predictors for infection selected from a set of 57 candidate predictors. Model development, including the selection of predictors, and validation were repeated in a bootstrapping procedure. The resulting expected optimism estimate in the receiver operating characteristic (ROC) area was compared with the observed optimism according to independent validation samples. The average apparent ROC area was 0.74, which was expected (based on bootstrapping) to decrease by 0.07 to 0.67, whereas the observed decrease in the validation samples was 0.09 to 0.65. Omitting the selection of predictors from the bootstrap procedure led to a severe underestimation of the optimism (decrease 0.006). The standard error of the observed ROC area in the independent validation samples was large (0.05). We recommend bootstrapping for internal validation because it gives reasonably valid estimates of the expected optimism in predictive performance provided that any selection of predictors is taken into account. For external validation, substantial sample sizes should be used for sufficient power to detect clinically important changes in performance as compared with the internally validated estimate.

摘要

我们进行了一项模拟研究，以调查自举法对乐观度估计的准确性（内部验证）以及独立验证样本中性能估计的精度（外部验证）。我们合并了两个数据集，其中包含无明确病因发热的儿童（n = 376 + 179 = 555；120例细菌感染）。从这个合并数据集中随机抽取样本，用于逻辑回归模型的开发（n = 376）和验证（n = 179）。模型包括从一组57个候选预测变量中选出的具有统计学意义的感染预测变量。在自举过程中重复进行模型开发（包括预测变量的选择）和验证。将在接受者操作特征（ROC）曲线下面积中得到的预期乐观度估计值与根据独立验证样本观察到的乐观度进行比较。平均表观ROC曲线下面积为0.74，预计（基于自举法）会下降0.07至0.67，而在验证样本中观察到的下降为0.09至0.65。在自举过程中省略预测变量的选择会导致对乐观度的严重低估（下降0.006）。独立验证样本中观察到的ROC曲线下面积的标准误很大（0.05）。我们建议使用自举法进行内部验证，因为只要考虑到任何预测变量的选择，它就能对预测性能中的预期乐观度给出合理有效的估计。对于外部验证，应使用足够大的样本量，以便与内部验证估计值相比，有足够的检验效能来检测性能方面临床上重要的变化。