Suppr超能文献

基于机器学习的预测模型中类别不平衡校正的危害:一项模拟研究

The Harms of Class Imbalance Corrections for Machine Learning Based Prediction Models: A Simulation Study.

作者信息

Carriero Alex, Luijken Kim, de Hond Anne, Moons Karel G M, van Calster Ben, van Smeden Maarten

机构信息

Julius Center for Health Sciences and Primary Care, University Medical Center Utrecht, Utrecht, The Netherlands.

KU Leuven, Leuven, Belgium.

出版信息

Stat Med. 2025 Feb 10;44(3-4):e10320. doi: 10.1002/sim.10320.

Abstract

INTRODUCTION

Risk prediction models are increasingly used in healthcare to aid in clinical decision-making. In most clinical contexts, model calibration (i.e., assessing the reliability of risk estimates) is critical. Data available for model development are often not perfectly balanced with the modeled outcome (i.e., individuals with vs. without the event of interest are not equally prevalent in the data). It is common for researchers to correct for class imbalance, yet, the effect of such imbalance corrections on the calibration of machine learning models is largely unknown.

METHODS

We studied the effect of imbalance corrections on model calibration for a variety of machine learning algorithms. Using extensive Monte Carlo simulations we compared the out-of-sample predictive performance of models developed with an imbalance correction to those developed without a correction for class imbalance across different data-generating scenarios (varying sample size, the number of predictors, and event fraction). Our findings were illustrated in a case study using MIMIC-III data.

RESULTS

In all simulation scenarios, prediction models developed without a correction for class imbalance consistently had equal or better calibration performance than prediction models developed with a correction for class imbalance. The miscalibration introduced by correcting for class imbalance was characterized by an over-estimation of risk and was not always able to be corrected with re-calibration.

CONCLUSION

Correcting for class imbalance is not always necessary and may even be harmful to clinical prediction models which aim to produce reliable risk estimates on an individual basis.

摘要

引言

风险预测模型在医疗保健领域越来越多地用于辅助临床决策。在大多数临床情况下,模型校准(即评估风险估计的可靠性)至关重要。可用于模型开发的数据通常与建模结果不完全平衡(即,数据中具有与不具有感兴趣事件的个体并不同样普遍)。研究人员校正类别不平衡是很常见的,然而,这种不平衡校正对机器学习模型校准的影响在很大程度上尚不清楚。

方法

我们研究了不平衡校正对各种机器学习算法模型校准的影响。使用广泛的蒙特卡罗模拟,我们比较了在不同数据生成场景(不同样本量、预测变量数量和事件发生率)下,使用不平衡校正开发的模型与未校正类别不平衡开发的模型的样本外预测性能。我们的研究结果在一个使用MIMIC-III数据的案例研究中得到了说明。

结果

在所有模拟场景中,未校正类别不平衡开发的预测模型始终具有与校正类别不平衡开发的预测模型相同或更好的校准性能。通过校正类别不平衡引入的校准错误的特征是风险高估,并且并不总是能够通过重新校准来校正。

结论

校正类别不平衡并不总是必要的,甚至可能对旨在为个体生成可靠风险估计的临床预测模型有害。

https://cdn.ncbi.nlm.nih.gov/pmc/blobs/ac2a/11771573/eaad179de117/SIM-44-0-g002.jpg

文献AI研究员

20分钟写一篇综述,助力文献阅读效率提升50倍。

立即体验

用中文搜PubMed

大模型驱动的PubMed中文搜索引擎

马上搜索

文档翻译

学术文献翻译模型,支持多种主流文档格式。

立即体验