Department of Development and Regeneration, KU Leuven, Herestraat 49 box 805, 3000, Leuven, Belgium.
Department of Biomedical Data Sciences, Leiden University Medical Center, Leiden, Netherlands.
BMC Med. 2019 Dec 16;17(1):230. doi: 10.1186/s12916-019-1466-7.
The assessment of calibration performance of risk prediction models based on regression or more flexible machine learning algorithms receives little attention.
Herein, we argue that this needs to change immediately because poorly calibrated algorithms can be misleading and potentially harmful for clinical decision-making. We summarize how to avoid poor calibration at algorithm development and how to assess calibration at algorithm validation, emphasizing balance between model complexity and the available sample size. At external validation, calibration curves require sufficiently large samples. Algorithm updating should be considered for appropriate support of clinical practice.
Efforts are required to avoid poor calibration when developing prediction models, to evaluate calibration when validating models, and to update models when indicated. The ultimate aim is to optimize the utility of predictive analytics for shared decision-making and patient counseling.
基于回归或更灵活的机器学习算法的风险预测模型的校准性能评估受到的关注较少。
本文认为这种情况需要立即改变,因为校准不良的算法可能会误导临床决策,并可能造成潜在危害。我们总结了如何在算法开发过程中避免校准不良,以及如何在算法验证过程中评估校准,强调了模型复杂性和可用样本量之间的平衡。在外部验证中,校准曲线需要足够大的样本量。应考虑更新算法,以适当支持临床实践。
在开发预测模型时需要努力避免校准不良,在验证模型时评估校准,在需要时更新模型。最终目的是优化预测分析在共同决策和患者咨询中的效用。