Department of Hygiene and Epidemiology, University of Ioannina School of Medicine, University Campus, P.O. Box 1186, 45110 Ioannina, Greece.
Department of Hygiene and Epidemiology, University of Ioannina School of Medicine, University Campus, P.O. Box 1186, 45110 Ioannina, Greece; Department of Epidemiology and Biostatistics, Imperial College London, Norfolk Place W2 1PG, London, United Kingdom.
J Clin Epidemiol. 2015 Jan;68(1):25-34. doi: 10.1016/j.jclinepi.2014.09.007. Epub 2014 Oct 23.
OBJECTIVES: To evaluate how often newly developed risk prediction models undergo external validation and how well they perform in such validations. STUDY DESIGN AND SETTING: We reviewed derivation studies of newly proposed risk models and their subsequent external validations. Study characteristics, outcome(s), and models' discriminatory performance [area under the curve, (AUC)] in derivation and validation studies were extracted. We estimated the probability of having a validation, change in discriminatory performance with more stringent external validation by overlapping or different authors compared to the derivation estimates. RESULTS: We evaluated 127 new prediction models. Of those, for 32 models (25%), at least an external validation study was identified; in 22 models (17%), the validation had been done by entirely different authors. The probability of having an external validation by different authors within 5 years was 16%. AUC estimates significantly decreased during external validation vs. the derivation study [median AUC change: -0.05 (P < 0.001) overall; -0.04 (P = 0.009) for validation by overlapping authors; -0.05 (P < 0.001) for validation by different authors]. On external validation, AUC decreased by at least 0.03 in 19 models and never increased by at least 0.03 (P < 0.001). CONCLUSION: External independent validation of predictive models in different studies is uncommon. Predictive performance may worsen substantially on external validation.
目的:评估新开发的风险预测模型进行外部验证的频率,以及它们在这些验证中的表现如何。
研究设计和设置:我们回顾了新提出的风险模型的推导研究及其随后的外部验证。提取了研究特征、结局以及模型在推导和验证研究中的区分性能[曲线下面积(AUC)]。我们估计了在验证中出现的概率,与推导估计相比,重叠或不同作者的更严格的外部验证对区分性能的变化。
结果:我们评估了 127 个新的预测模型。其中,有 32 个模型(25%)至少有一个外部验证研究;在 22 个模型(17%)中,验证是由完全不同的作者进行的。在 5 年内由不同作者进行外部验证的概率为 16%。与推导研究相比,AUC 估计值在外部验证中显著降低[总体 AUC 变化中位数:-0.05(P<0.001);重叠作者验证的 AUC 变化中位数:-0.04(P=0.009);不同作者验证的 AUC 变化中位数:-0.05(P<0.001)]。在外部验证中,19 个模型的 AUC 至少降低了 0.03,而从未至少增加 0.03(P<0.001)。
结论:在不同研究中对预测模型进行独立的外部验证并不常见。预测性能在外部验证中可能会大幅恶化。
J Clin Epidemiol. 2014-10-23
Infect Control Hosp Epidemiol. 1997-8
Am J Perinatol. 2011-8-1
Lancet Gastroenterol Hepatol. 2016-7-13
Circ Cardiovasc Qual Outcomes. 2021-8
J Clin Epidemiol. 2014-8-30
J Cardiothorac Surg. 2025-8-11
JMIR Med Inform. 2025-7-4
Child Psychiatry Hum Dev. 2025-7-3