Suppr超能文献

结局类别不平衡和罕见事件:药物过量风险预测建模中被低估的复杂情况。

Outcome class imbalance and rare events: An underappreciated complication for overdose risk prediction modeling.

机构信息

Department of Epidemiology, Brown University School of Public Health, Providence, Rhode Island, USA.

Department of Emergency Medicine, Alpert Medical School of Brown University, Providence, Rhode Island, USA.

出版信息

Addiction. 2023 Jun;118(6):1167-1176. doi: 10.1111/add.16133. Epub 2023 Feb 6.

Abstract

BACKGROUND AND AIMS

Low outcome prevalence, often observed with opioid-related outcomes, poses an underappreciated challenge to accurate predictive modeling. Outcome class imbalance, where non-events (i.e. negative class observations) outnumber events (i.e. positive class observations) by a moderate to extreme degree, can distort measures of predictive accuracy in misleading ways, and make the overall predictive accuracy and the discriminatory ability of a predictive model appear spuriously high. We conducted a simulation study to measure the impact of outcome class imbalance on predictive performance of a simple SuperLearner ensemble model and suggest strategies for reducing that impact.

DESIGN, SETTING, PARTICIPANTS: Using a Monte Carlo design with 250 repetitions, we trained and evaluated these models on four simulated data sets with 100 000 observations each: one with perfect balance between events and non-events, and three where non-events outnumbered events by an approximate factor of 10:1, 100:1, and 1000:1, respectively.

MEASUREMENTS

We evaluated the performance of these models using a comprehensive suite of measures, including measures that are more appropriate for imbalanced data.

FINDINGS

Increasing imbalance tended to spuriously improve overall accuracy (using a high threshold to classify events vs non-events, overall accuracy improved from 0.45 with perfect balance to 0.99 with the most severe outcome class imbalance), but diminished predictive performance was evident using other metrics (corresponding positive predictive value decreased from 0.99 to 0.14).

CONCLUSION

Increasing reliance on algorithmic risk scores in consequential decision-making processes raises critical fairness and ethical concerns. This paper provides broad guidance for analytic strategies that clinical investigators can use to remedy the impacts of outcome class imbalance on risk prediction tools.

摘要

背景和目的

低结局发生率在与阿片类药物相关的结局中经常观察到,这对准确的预测建模构成了一个未被充分认识的挑战。结局类别不平衡,即无事件(即负类观察)比事件(即正类观察)多到中等至极端程度,会以误导的方式扭曲预测准确性的度量,并使预测模型的整体预测准确性和区分能力看起来虚假地高。我们进行了一项模拟研究,以衡量结局类别不平衡对简单 SuperLearner 集成模型预测性能的影响,并提出了减少这种影响的策略。

设计、设置、参与者:使用具有 250 次重复的蒙特卡罗设计,我们在四个模拟数据集上训练和评估了这些模型,每个数据集有 100000 个观察值:一个数据集在事件和非事件之间具有完美的平衡,另外三个数据集中非事件的数量比事件多约 10:1、100:1 和 1000:1。

测量

我们使用了一整套评估这些模型的性能的测量方法,包括更适合不平衡数据的测量方法。

发现

随着不平衡程度的增加,整体准确性(使用高阈值来对事件和非事件进行分类,整体准确性从平衡时的 0.45 提高到最严重的结局类别不平衡时的 0.99)会虚假地提高,但使用其他指标时,预测性能明显下降(相应的阳性预测值从 0.99 下降到 0.14)。

结论

在重要的决策过程中越来越依赖算法风险评分,引起了关键的公平和伦理问题。本文为临床研究者可以用来纠正结局类别不平衡对风险预测工具影响的分析策略提供了广泛的指导。

相似文献

8
Class prediction for high-dimensional class-imbalanced data.高维类别不平衡数据的类别预测。
BMC Bioinformatics. 2010 Oct 20;11:523. doi: 10.1186/1471-2105-11-523.

引用本文的文献

本文引用的文献

2
The class imbalance problem.类别不平衡问题。
Nat Methods. 2021 Nov;18(11):1270-1272. doi: 10.1038/s41592-021-01302-4.

文献AI研究员

20分钟写一篇综述,助力文献阅读效率提升50倍。

立即体验

用中文搜PubMed

大模型驱动的PubMed中文搜索引擎

马上搜索

文档翻译

学术文献翻译模型,支持多种主流文档格式。

立即体验