Julius Center for Health Sciences and Primary Care, University Medical Center Utrecht, Utrecht, the Netherlands.
Am J Epidemiol. 2010 Oct 15;172(8):971-80. doi: 10.1093/aje/kwq223. Epub 2010 Aug 31.
Various performance measures related to calibration and discrimination are available for the assessment of risk models. When the validity of a risk model is assessed in a new population, estimates of the model's performance can be influenced in several ways. The regression coefficients can be incorrect, which indeed results in an invalid model. However, the distribution of patient characteristics (case mix) may also influence the performance of the model. Here the authors consider a number of typical situations that can be encountered in external validation studies. Theoretical relations between differences in development and validation samples and performance measures are studied by simulation. Benchmark values for the performance measures are proposed to disentangle a case-mix effect from incorrect regression coefficients, when interpreting the model's estimated performance in validation samples. The authors demonstrate the use of the benchmark values using data on traumatic brain injury obtained from the International Tirilazad Trial and the North American Tirilazad Trial (1991-1994).
评估风险模型时,有多种与校准和判别相关的性能指标。当在新人群中评估风险模型的有效性时,模型性能的估计可能会受到多种方式的影响。回归系数可能不正确,这确实会导致模型无效。但是,患者特征(病例组合)的分布也可能影响模型的性能。本文作者考虑了在外部验证研究中可能遇到的几种典型情况。通过模拟研究了开发和验证样本以及性能指标之间的差异之间的理论关系。提出了性能指标的基准值,以便在验证样本中解释模型估计的性能时,将病例组合效应与不正确的回归系数区分开来。作者使用从国际替拉扎特试验和北美替拉扎特试验(1991-1994 年)获得的创伤性脑损伤数据演示了基准值的使用。