Suppr超能文献

评估U-smile方法用于不平衡二元分类的三级方法。

Evaluating the three-level approach of the U-smile method for imbalanced binary classification.

作者信息

Więckowska Barbara, Kubiak Katarzyna B, Guzik Przemysław

机构信息

Department of Computer Science and Statistics, Poznan University of Medical Sciences, Poznan, Poland.

Department of Cardiology - Intensive Therapy and Internal Medicine, Poznan University of Medical Sciences, Poznan, Poland.

出版信息

PLoS One. 2025 Apr 10;20(4):e0321661. doi: 10.1371/journal.pone.0321661. eCollection 2025.

Abstract

Real-life binary classification problems often involve imbalanced datasets, where the majority class outnumbers the minority class. We previously developed the U-smile method, which comprises the U-smile plot and the BA, RB and I coefficients, to assess the usefulness of a new variable added to a reference prediction model and validated it under class balance. In this study, we evaluated the U-smile method under class imbalance, proposed a three-level approach of the U-smile method, and used the I coefficients as a weighting factor for point size in the U-smile plots of the BA and RB coefficients. Using real data from the Heart Disease dataset and generated random variables, we built logistic regression models to assess four new variables added to the reference model (nested setting). These models were evaluated at seven pre-defined imbalance levels of 1%, 10%, 30%, 50%, 70%, 90% and 99% of the event class. The results of the U-smile method were compared to those of certain traditional measures: Brier skill score, net reclassification index, difference in F1-score, difference in Matthews correlation coefficient, difference in the area under the receiver operating characteristic curve of the new and reference models, and the likelihood-ratio test. The reference model overfitted to the majority class at higher imbalance levels. The BA-RB-I coefficients of the U-smile method identified informative variables across the entire imbalance range. At higher imbalance levels, the U-smile method indicated both prediction improvement in the minority class (positive BA and I coefficients) and reduction in overfitting to the majority class (negative RB coefficients). The U-smile method outperformed traditional evaluation measures across most of the imbalance range. It proved highly effective in variable selection for imbalanced binary classification, making it a useful tool for real-life problems, where imbalanced datasets are prevalent.

摘要

现实生活中的二元分类问题通常涉及不平衡数据集,即多数类的数量超过少数类。我们之前开发了U-smile方法,该方法包括U-smile图以及BA、RB和I系数,用于评估添加到参考预测模型中的新变量的有用性,并在类平衡条件下对其进行了验证。在本研究中,我们在类不平衡条件下评估了U-smile方法,提出了U-smile方法的三级方法,并将I系数用作BA和RB系数的U-smile图中点大小的加权因子。使用来自心脏病数据集的真实数据和生成的随机变量,我们构建了逻辑回归模型,以评估添加到参考模型中的四个新变量(嵌套设置)。这些模型在事件类的七个预定义不平衡水平(1%、10%、30%、50%、70%、90%和99%)下进行了评估。将U-smile方法的结果与某些传统指标的结果进行了比较:布里尔技能得分、净重新分类指数、F1得分差异、马修斯相关系数差异、新模型和参考模型的接收器操作特征曲线下面积差异以及似然比检验。在较高的不平衡水平下,参考模型过度拟合到多数类。U-smile方法的BA-RB-I系数在整个不平衡范围内识别出了信息变量。在较高的不平衡水平下,U-smile方法表明在少数类中预测得到改善(BA和I系数为正),同时减少了对多数类的过度拟合(RB系数为负)。在大多数不平衡范围内,U-smile方法优于传统评估指标。事实证明,它在不平衡二元分类的变量选择中非常有效,使其成为处理不平衡数据集普遍存在的现实生活问题的有用工具。

https://cdn.ncbi.nlm.nih.gov/pmc/blobs/1c5c/11984743/95260a954555/pone.0321661.g001.jpg

文献AI研究员

20分钟写一篇综述,助力文献阅读效率提升50倍。

立即体验

用中文搜PubMed

大模型驱动的PubMed中文搜索引擎

马上搜索

文档翻译

学术文献翻译模型,支持多种主流文档格式。

立即体验