Suppr超能文献

类别不平衡校正对风险预测模型的危害:使用逻辑回归进行说明和模拟。

The harm of class imbalance corrections for risk prediction models: illustration and simulation using logistic regression.

机构信息

Julius Center for Health Sciences and Primary Care, UMC Utrecht, Utrecht University, Utrecht, The Netherlands.

Department of Development and Regeneration, KU Leuven, Leuven, Belgium.

出版信息

J Am Med Inform Assoc. 2022 Aug 16;29(9):1525-1534. doi: 10.1093/jamia/ocac093.

Abstract

OBJECTIVE

Methods to correct class imbalance (imbalance between the frequency of outcome events and nonevents) are receiving increasing interest for developing prediction models. We examined the effect of imbalance correction on the performance of logistic regression models.

MATERIAL AND METHODS

Prediction models were developed using standard and penalized (ridge) logistic regression under 4 methods to address class imbalance: no correction, random undersampling, random oversampling, and SMOTE. Model performance was evaluated in terms of discrimination, calibration, and classification. Using Monte Carlo simulations, we studied the impact of training set size, number of predictors, and the outcome event fraction. A case study on prediction modeling for ovarian cancer diagnosis is presented.

RESULTS

The use of random undersampling, random oversampling, or SMOTE yielded poorly calibrated models: the probability to belong to the minority class was strongly overestimated. These methods did not result in higher areas under the ROC curve when compared with models developed without correction for class imbalance. Although imbalance correction improved the balance between sensitivity and specificity, similar results were obtained by shifting the probability threshold instead.

DISCUSSION

Imbalance correction led to models with strong miscalibration without better ability to distinguish between patients with and without the outcome event. The inaccurate probability estimates reduce the clinical utility of the model, because decisions about treatment are ill-informed.

CONCLUSION

Outcome imbalance is not a problem in itself, imbalance correction may even worsen model performance.

摘要

目的

为了开发预测模型,校正类别不平衡(结局事件与非事件的频率之间的不平衡)的方法正受到越来越多的关注。我们研究了不平衡校正对逻辑回归模型性能的影响。

材料和方法

使用标准逻辑回归和惩罚(岭)逻辑回归,通过 4 种方法来解决类别不平衡问题:不校正、随机欠采样、随机过采样和 SMOTE。根据判别能力、校准和分类来评估模型性能。使用蒙特卡罗模拟,我们研究了训练集大小、预测变量数量和结局事件分数的影响。呈现了卵巢癌诊断预测模型的案例研究。

结果

随机欠采样、随机过采样或 SMOTE 的使用导致校准不良的模型:属于少数类别的概率被严重高估。与未校正类别不平衡的模型相比,这些方法并没有导致 ROC 曲线下面积更高。尽管不平衡校正提高了敏感性和特异性之间的平衡,但通过转移概率阈值也可以获得类似的结果。

讨论

不平衡校正导致模型校准严重错误,而无法更好地区分有无结局事件的患者。不准确的概率估计降低了模型的临床实用性,因为关于治疗的决策是基于不充分的信息。

结论

结局不平衡本身并不是问题,不平衡校正甚至可能会降低模型性能。

https://cdn.ncbi.nlm.nih.gov/pmc/blobs/0e20/9382395/d20d5a4defae/ocac093f1.jpg

文献检索

告别复杂PubMed语法,用中文像聊天一样搜索,搜遍4000万医学文献。AI智能推荐,让科研检索更轻松。

立即免费搜索

文件翻译

保留排版,准确专业,支持PDF/Word/PPT等文件格式,支持 12+语言互译。

免费翻译文档

深度研究

AI帮你快速写综述,25分钟生成高质量综述,智能提取关键信息,辅助科研写作。

立即免费体验