Nieboer Daan, van der Ploeg Tjeerd, Steyerberg Ewout W
Department of Public Health, Erasmus MC-University medical center, Rotterdam, the Netherlands.
Department of Science, Medical Center Alkmaar/Inholland University, Alkmaar, the Netherlands.
PLoS One. 2016 Feb 16;11(2):e0148820. doi: 10.1371/journal.pone.0148820. eCollection 2016.
External validation studies are essential to study the generalizability of prediction models. Recently a permutation test, focusing on discrimination as quantified by the c-statistic, was proposed to judge whether a prediction model is transportable to a new setting. We aimed to evaluate this test and compare it to previously proposed procedures to judge any changes in c-statistic from development to external validation setting.
We compared the use of the permutation test to the use of benchmark values of the c-statistic following from a previously proposed framework to judge transportability of a prediction model. In a simulation study we developed a prediction model with logistic regression on a development set and validated them in the validation set. We concentrated on two scenarios: 1) the case-mix was more heterogeneous and predictor effects were weaker in the validation set compared to the development set, and 2) the case-mix was less heterogeneous in the validation set and predictor effects were identical in the validation and development set. Furthermore we illustrated the methods in a case study using 15 datasets of patients suffering from traumatic brain injury.
The permutation test indicated that the validation and development set were homogenous in scenario 1 (in almost all simulated samples) and heterogeneous in scenario 2 (in 17%-39% of simulated samples). Previously proposed benchmark values of the c-statistic and the standard deviation of the linear predictors correctly pointed at the more heterogeneous case-mix in scenario 1 and the less heterogeneous case-mix in scenario 2.
The recently proposed permutation test may provide misleading results when externally validating prediction models in the presence of case-mix differences between the development and validation population. To correctly interpret the c-statistic found at external validation it is crucial to disentangle case-mix differences from incorrect regression coefficients.
外部验证研究对于评估预测模型的可推广性至关重要。最近提出了一种置换检验,该检验聚焦于用c统计量量化的区分度,以判断预测模型是否可移植到新环境中。我们旨在评估此检验,并将其与先前提出的用于判断从开发环境到外部验证环境中c统计量任何变化的程序进行比较。
我们将置换检验的使用与基于先前提出的框架得出的c统计量基准值的使用进行了比较,该框架用于判断预测模型的可移植性。在一项模拟研究中,我们在开发集上使用逻辑回归开发了一个预测模型,并在验证集中对其进行验证。我们集中于两种情况:1)与开发集相比,验证集中的病例组合更具异质性,预测变量效应较弱;2)验证集中的病例组合异质性较小,且验证集和开发集中的预测变量效应相同。此外,我们在一项案例研究中使用15个创伤性脑损伤患者数据集说明了这些方法。
置换检验表明,在情况1中(几乎所有模拟样本)验证集和开发集是同质的,而在情况2中(17%-39%的模拟样本)是异质的。先前提出的c统计量基准值和线性预测变量的标准差正确地指出了情况1中病例组合更具异质性,以及情况2中病例组合异质性较小。
当在开发人群和验证人群之间存在病例组合差异的情况下对预测模型进行外部验证时,最近提出的置换检验可能会产生误导性结果。为了正确解释在外部验证中发现的c统计量,将病例组合差异与不正确的回归系数区分开来至关重要。