Suppr
超能文献

校准：预测分析的阿喀琉斯之踵。

Calibration: the Achilles heel of predictive analytics.

机构信息

Department of Development and Regeneration, KU Leuven, Herestraat 49 box 805, 3000, Leuven, Belgium.

Department of Biomedical Data Sciences, Leiden University Medical Center, Leiden, Netherlands.

出版信息

BMC Med. 2019 Dec 16;17(1):230. doi: 10.1186/s12916-019-1466-7.

DOI:10.1186/s12916-019-1466-7

PMID:31842878

原文链接:https://pmc.ncbi.nlm.nih.gov/articles/PMC6912996/

Abstract

BACKGROUND

The assessment of calibration performance of risk prediction models based on regression or more flexible machine learning algorithms receives little attention.

MAIN TEXT

Herein, we argue that this needs to change immediately because poorly calibrated algorithms can be misleading and potentially harmful for clinical decision-making. We summarize how to avoid poor calibration at algorithm development and how to assess calibration at algorithm validation, emphasizing balance between model complexity and the available sample size. At external validation, calibration curves require sufficiently large samples. Algorithm updating should be considered for appropriate support of clinical practice.

CONCLUSION

Efforts are required to avoid poor calibration when developing prediction models, to evaluate calibration when validating models, and to update models when indicated. The ultimate aim is to optimize the utility of predictive analytics for shared decision-making and patient counseling.

摘要

背景

基于回归或更灵活的机器学习算法的风险预测模型的校准性能评估受到的关注较少。

主要内容

本文认为这种情况需要立即改变，因为校准不良的算法可能会误导临床决策，并可能造成潜在危害。我们总结了如何在算法开发过程中避免校准不良，以及如何在算法验证过程中评估校准，强调了模型复杂性和可用样本量之间的平衡。在外部验证中，校准曲线需要足够大的样本量。应考虑更新算法，以适当支持临床实践。

结论

在开发预测模型时需要努力避免校准不良，在验证模型时评估校准，在需要时更新模型。最终目的是优化预测分析在共同决策和患者咨询中的效用。

https://cdn.ncbi.nlm.nih.gov/pmc/blobs/0880/6912996/59415e8b3823/12916_2019_1466_Fig1_HTML.jpg

相似文献

Calibration: the Achilles heel of predictive analytics.

BMC Med. 2019 Dec 16;17(1):230. doi: 10.1186/s12916-019-1466-7.

How Does the Skeletal Oncology Research Group Algorithm's Prediction of 5-year Survival in Patients with Chondrosarcoma Perform on International Validation?

Clin Orthop Relat Res. 2020 Oct;478(10):2300-2308. doi: 10.1097/CORR.0000000000001305.

Does the Presence of Missing Data Affect the Performance of the SORG Machine-learning Algorithm for Patients With Spinal Metastasis? Development of an Internet Application Algorithm.

Clin Orthop Relat Res. 2024 Jan 1;482(1):143-157. doi: 10.1097/CORR.0000000000002706. Epub 2023 Jun 12.

International Validation of the SORG Machine-learning Algorithm for Predicting the Survival of Patients with Extremity Metastases Undergoing Surgical Treatment.

Clin Orthop Relat Res. 2022 Feb 1;480(2):367-378. doi: 10.1097/CORR.0000000000001969.

Calibration drift in regression and machine learning models for acute kidney injury.

J Am Med Inform Assoc. 2017 Nov 1;24(6):1052-1061. doi: 10.1093/jamia/ocx030.

Construction and evaluation of a mortality prediction model for patients with acute kidney injury undergoing continuous renal replacement therapy based on machine learning algorithms.

Ann Med. 2024 Dec;56(1):2388709. doi: 10.1080/07853890.2024.2388709. Epub 2024 Aug 19.

Are Current Survival Prediction Tools Useful When Treating Subsequent Skeletal-related Events From Bone Metastases?

Clin Orthop Relat Res. 2024 Sep 1;482(9):1710-1721. doi: 10.1097/CORR.0000000000003030. Epub 2024 Mar 22.

External validation of the SORG 90-day and 1-year machine learning algorithms for survival in spinal metastatic disease.

Spine J. 2020 Jan;20(1):14-21. doi: 10.1016/j.spinee.2019.09.003. Epub 2019 Sep 7.

Development and Internal Validation of Machine Learning Algorithms for Preoperative Survival Prediction of Extremity Metastatic Disease.

Clin Orthop Relat Res. 2020 Feb;478(2):322-333. doi: 10.1097/CORR.0000000000000997.

Risk prediction and effect evaluation of complicated appendicitis based on XGBoost modeling.

BMC Gastroenterol. 2025 Apr 24;25(1):295. doi: 10.1186/s12876-025-03847-6.

引用本文的文献

Radiomics Quality Score 2.0: towards radiomics readiness levels and clinical translation for personalized medicine.

Nat Rev Clin Oncol. 2025 Sep 3. doi: 10.1038/s41571-025-01067-1.

Systematic review of prognostic models in Parkinson's disease.

NPJ Parkinsons Dis. 2025 Aug 29;11(1):266. doi: 10.1038/s41531-025-01112-x.

Classification and predictive models using supervised machine learning: A conceptual review.

South Afr J Crit Care. 2025 May 19;41(1):e2937. doi: 10.7196/SAJCC.2025.v411.2937. eCollection 2025.

Predictive Performance of SAPS-3, SOFA Score, and Procalcitonin for Hospital Mortality in COVID-19 Viral Sepsis: A Cohort Study.

Life (Basel). 2025 Jul 23;15(8):1161. doi: 10.3390/life15081161.

Clinical and Imaging-Based Prognostic Models for Recurrence and Local Tumor Progression Following Thermal Ablation of Hepatocellular Carcinoma: A Systematic Review.

Cancers (Basel). 2025 Aug 14;17(16):2656. doi: 10.3390/cancers17162656.

External validation of the COLOFIT colorectal cancer risk prediction model in the Oxford-FIT dataset: the importance of population characteristics and clinically relevant evaluation metrics.

BMC Med. 2025 Aug 27;23(1):503. doi: 10.1186/s12916-025-04339-w.

Development, internal and external validation of a prognostic model for symptom dissatisfaction among older adults with a new episode of back pain.

BMJ Open. 2025 Aug 24;15(8):e102318. doi: 10.1136/bmjopen-2025-102318.

Using risk prediction models to inform personalized, cost-effective treatment recommendations.

medRxiv. 2025 Aug 11:2025.08.07.25333118. doi: 10.1101/2025.08.07.25333118.

Prognostic models for cardiovascular and kidney outcomes in people with type 2 diabetes: living systematic review and meta-analysis of observational studies.

BMJ Med. 2025 Aug 14;4(1):e001369. doi: 10.1136/bmjmed-2025-001369. eCollection 2025.

Validated prediction of xerostomia in a real-world population: a step toward model-guided radiotherapy.

Acta Oncol. 2025 Aug 18;64:1087-1094. doi: 10.2340/1651-226X.2025.43462.

本文引用的文献

Impact of predictor measurement heterogeneity across settings on the performance of prediction models: A measurement error perspective.

Stat Med. 2019 Aug 15;38(18):3444-3459. doi: 10.1002/sim.8183. Epub 2019 May 31.

Tufts PACE Clinical Predictive Model Registry: update 1990 through 2015.

Diagn Progn Res. 2017 Dec 21;1:20. doi: 10.1186/s41512-017-0021-2. eCollection 2017.

Prospective validation of the Good Outcome Following Attempted Resuscitation (GO-FAR) score for in-hospital cardiac arrest prognosis.

Resuscitation. 2019 Jul;140:2-8. doi: 10.1016/j.resuscitation.2019.05.002. Epub 2019 May 9.

A Deep Learning Mammography-based Model for Improved Breast Cancer Risk Prediction.

Radiology. 2019 Jul;292(1):60-66. doi: 10.1148/radiol.2019182716. Epub 2019 May 7.

A systematic review shows no performance benefit of machine learning over logistic regression for clinical prediction models.

J Clin Epidemiol. 2019 Jun;110:12-22. doi: 10.1016/j.jclinepi.2019.02.004. Epub 2019 Feb 11.

Minimum sample size for developing a multivariable prediction model: PART II - binary and time-to-event outcomes.

Stat Med. 2019 Mar 30;38(7):1276-1296. doi: 10.1002/sim.7992. Epub 2018 Oct 24.

Global cancer statistics 2018: GLOBOCAN estimates of incidence and mortality worldwide for 36 cancers in 185 countries.

CA Cancer J Clin. 2018 Nov;68(6):394-424. doi: 10.3322/caac.21492. Epub 2018 Sep 12.

Predicting the cumulative chance of live birth over multiple complete cycles of in vitro fertilization: an external validation study.

Hum Reprod. 2018 Sep 1;33(9):1684-1695. doi: 10.1093/humrep/dey263.

Sample size for binary logistic prediction models: Beyond events per variable criteria.

Stat Methods Med Res. 2019 Aug;28(8):2455-2474. doi: 10.1177/0962280218784726. Epub 2018 Jul 3.

Big Data and Predictive Analytics: Recalibrating Expectations.

JAMA. 2018 Jul 3;320(1):27-28. doi: 10.1001/jama.2018.5602.

文献AI研究员

20分钟写一篇综述，助力文献阅读效率提升50倍。

立即体验

用中文搜PubMed

大模型驱动的PubMed中文搜索引擎

马上搜索

文档翻译

学术文献翻译模型，支持多种主流文档格式。

立即体验

Suppr超能文献

校准：预测分析的阿喀琉斯之踵。

Calibration: the Achilles heel of predictive analytics.

机构信息

出版信息

BACKGROUND

MAIN TEXT

CONCLUSION

背景

主要内容

结论

相似文献

引用本文的文献

本文引用的文献

文献AI研究员

用中文搜PubMed

文档翻译