文献检索文档翻译深度研究
Suppr Zotero 插件Zotero 插件
邀请有礼套餐&价格历史记录

新学期,新优惠

限时优惠:9月1日-9月22日

30天高级会员仅需29元

1天体验卡首发特惠仅需5.99元

了解详情
不再提醒
插件&应用
Suppr Zotero 插件Zotero 插件浏览器插件Mac 客户端Windows 客户端微信小程序
高级版
套餐订阅购买积分包
AI 工具
文献检索文档翻译深度研究
关于我们
关于 Suppr公司介绍联系我们用户协议隐私条款
关注我们

Suppr 超能文献

核心技术专利:CN118964589B侵权必究
粤ICP备2023148730 号-1Suppr @ 2025

评估和减轻心血管疾病预测机器学习模型中的偏差。

Evaluating and mitigating bias in machine learning models for cardiovascular disease prediction.

机构信息

College of Art and Science, Vanderbilt University, Nashville, TN, USA.

Department of Biomedical Informatics, Vanderbilt University Medical Center, Nashville, TN, USA.

出版信息

J Biomed Inform. 2023 Feb;138:104294. doi: 10.1016/j.jbi.2023.104294. Epub 2023 Jan 24.


DOI:10.1016/j.jbi.2023.104294
PMID:36706849
原文链接:https://pmc.ncbi.nlm.nih.gov/articles/PMC11104322/
Abstract

OBJECTIVE: The study aims to investigate whether machine learning-based predictive models for cardiovascular disease (CVD) risk assessment show equivalent performance across demographic groups (such as race and gender) and if bias mitigation methods can reduce any bias present in the models. This is important as systematic bias may be introduced when collecting and preprocessing health data, which could affect the performance of the models on certain demographic sub-cohorts. The study is to investigate this using electronic health records data and various machine learning models. METHODS: The study used large de-identified Electronic Health Records data from Vanderbilt University Medical Center. Machine learning (ML) algorithms including logistic regression, random forest, gradient-boosting trees, and long short-term memory were applied to build multiple predictive models. Model bias and fairness were evaluated using equal opportunity difference (EOD, 0 indicates fairness) and disparate impact (DI, 1 indicates fairness). In our study, we also evaluated the fairness of a non-ML baseline model, the American Heart Association (AHA) Pooled Cohort Risk Equations (PCEs). Moreover, we compared the performance of three different de-biasing methods: removing protected attributes (e.g., race and gender), resampling the imbalanced training dataset by sample size, and resampling by the proportion of people with CVD outcomes. RESULTS: The study cohort included 109,490 individuals (mean [SD] age 47.4 [14.7] years; 64.5% female; 86.3% White; 13.7% Black). The experimental results suggested that most ML models had smaller EOD and DI than PCEs. For ML models, the mean EOD ranged from -0.001 to 0.018 and the mean DI ranged from 1.037 to 1.094 across race groups. There was a larger EOD and DI across gender groups, with EOD ranging from 0.131 to 0.136 and DI ranging from 1.535 to 1.587. For debiasing methods, removing protected attributes didn't significantly reduced the bias for most ML models. Resampling by sample size also didn't consistently decrease bias. Resampling by case proportion reduced the EOD and DI for gender groups but slightly reduced accuracy in many cases. CONCLUSIONS: Among the VUMC cohort, both PCEs and ML models were biased against women, suggesting the need to investigate and correct gender disparities in CVD risk prediction. Resampling by proportion reduced the bias for gender groups but not for race groups.

摘要

目的:本研究旨在探讨基于机器学习的心血管疾病(CVD)风险评估预测模型在不同人群(如种族和性别)中的表现是否相同,以及减轻偏差的方法是否可以减少模型中的偏差。这一点很重要,因为在收集和预处理健康数据时可能会引入系统性偏差,这可能会影响模型在某些特定人群亚组中的表现。本研究使用电子健康记录数据和各种机器学习模型来研究这一问题。

方法:本研究使用了来自范德比尔特大学医学中心的大型去识别电子健康记录数据。应用机器学习(ML)算法,包括逻辑回归、随机森林、梯度提升树和长短时记忆,构建了多个预测模型。使用均等机会差(EOD,0 表示公平)和差异影响(DI,1 表示公平)来评估模型的偏差和公平性。在我们的研究中,我们还评估了非 ML 基线模型,即美国心脏协会(AHA) pooled Cohort Risk Equations(PCEs)的公平性。此外,我们比较了三种不同去偏方法的性能:去除保护属性(如种族和性别)、按样本量对不平衡训练数据集进行重采样以及按 CVD 结果人群的比例进行重采样。

结果:研究队列包括 109490 名个体(平均[SD]年龄 47.4[14.7]岁;64.5%为女性;86.3%为白人;13.7%为黑人)。实验结果表明,大多数 ML 模型的 EOD 和 DI 均小于 PCEs。对于 ML 模型,EOD 的平均值范围为-0.001 至 0.018,DI 的平均值范围为 1.037 至 1.094,跨越种族群体。性别群体的 EOD 和 DI 更大,EOD 范围为 0.131 至 0.136,DI 范围为 1.535 至 1.587。对于去偏方法,去除保护属性并不能显著降低大多数 ML 模型的偏差。按样本量重采样也不能一致地降低偏差。按病例比例重采样降低了性别组的 EOD 和 DI,但在许多情况下会略微降低准确性。

结论:在 VUMC 队列中,PCEs 和 ML 模型都对女性存在偏差,这表明需要调查和纠正 CVD 风险预测中的性别差异。按比例重采样可以降低性别组的偏差,但不能降低种族组的偏差。

相似文献

[1]
Evaluating and mitigating bias in machine learning models for cardiovascular disease prediction.

J Biomed Inform. 2023-2

[2]
Comparison of Methods to Reduce Bias From Clinical Prediction Models of Postpartum Depression.

JAMA Netw Open. 2021-4-1

[3]
Fairness in Predicting Cancer Mortality Across Racial Subgroups.

JAMA Netw Open. 2024-7-1

[4]
Predictive Accuracy of Stroke Risk Prediction Models Across Black and White Race, Sex, and Age Groups.

JAMA. 2023-1-24

[5]
Assessing fairness in machine learning models: A study of racial bias using matched counterparts in mortality prediction for patients with chronic diseases.

J Biomed Inform. 2024-8

[6]
Folic acid supplementation and malaria susceptibility and severity among people taking antifolate antimalarial drugs in endemic areas.

Cochrane Database Syst Rev. 2022-2-1

[7]
Evaluating Bias-Mitigated Predictive Models of Perinatal Mood and Anxiety Disorders.

JAMA Netw Open. 2024-12-2

[8]
Fairness in Mobile Phone-Based Mental Health Assessment Algorithms: Exploratory Study.

JMIR Form Res. 2022-6-14

[9]
Evaluating machine learning model bias and racial disparities in non-small cell lung cancer using SEER registry data.

Health Care Manag Sci. 2024-12

[10]
Algorithmic Fairness of Machine Learning Models for Alzheimer Disease Progression.

JAMA Netw Open. 2023-11-1

引用本文的文献

[1]
Enhancing heart disease prediction with stacked ensemble and MCDM-based ranking: an optimized RST-ML approach.

Front Digit Health. 2025-6-19

[2]
Computational strategic recruitment for representation and coverage studied in the All of Us Research Program.

NPJ Digit Med. 2025-7-3

[3]
Impact of analytical bias on machine learning models for sepsis prediction using laboratory data.

Clin Chem Lab Med. 2025-5-28

[4]
Enhancement of Fairness in AI for Chest X-ray Classification.

AMIA Annu Symp Proc. 2025-5-22

[5]
Toward Identifying New Risk Aversions and Subsequent Limitations and Biases When Making De-identified Structured Data Sets Openly Available in a Post-LLM world.

AMIA Annu Symp Proc. 2025-5-22

[6]
Machine learning based prediction models for cardiovascular disease risk using electronic health records data: systematic review and meta-analysis.

Eur Heart J Digit Health. 2024-10-27

[7]
Mitigating Algorithmic Bias in AI-Driven Cardiovascular Imaging for Fairer Diagnostics.

Diagnostics (Basel). 2024-11-27

[8]
Evaluating and Reducing Subgroup Disparity in AI Models: An Analysis of Pediatric COVID-19 Test Outcomes.

medRxiv. 2024-9-19

[9]
Mitigating Sociodemographic Bias in Opioid Use Disorder Prediction: Fairness-Aware Machine Learning Framework.

JMIR AI. 2024-8-20

[10]
Large Language Models for Wearable Sensor-Based Human Activity Recognition, Health Monitoring, and Behavioral Modeling: A Survey of Early Trends, Datasets, and Challenges.

Sensors (Basel). 2024-8-4

本文引用的文献

[1]
Subpopulation-specific machine learning prognosis for underrepresented patients with double prioritized bias correction.

Commun Med (Lond). 2022-9-1

[2]
Pregnancy and Reproductive Risk Factors for Cardiovascular Disease in Women.

Circ Res. 2022-2-18

[3]
Addressing Fairness, Bias, and Appropriate Use of Artificial Intelligence and Machine Learning in Global Health.

Front Artif Intell. 2021-4-15

[4]
Comparison of Methods to Reduce Bias From Clinical Prediction Models of Postpartum Depression.

JAMA Netw Open. 2021-4-1

[5]
Heart Disease and Stroke Statistics-2021 Update: A Report From the American Heart Association.

Circulation. 2021-2-23

[6]
Ethical limitations of algorithmic fairness solutions in health care machine learning.

Lancet Digit Health. 2020-5

[7]
Machine learning prediction in cardiovascular diseases: a meta-analysis.

Sci Rep. 2020-9-29

[8]
Precision Medicine, AI, and the Future of Personalized Health Care.

Clin Transl Sci. 2021-1

[9]
Hidden in Plain Sight - Reconsidering the Use of Race Correction in Clinical Algorithms.

N Engl J Med. 2020-8-27

[10]
Predictably unequal: understanding and addressing concerns that algorithmic clinical prediction may increase health disparities.

NPJ Digit Med. 2020-7-30

文献AI研究员

20分钟写一篇综述,助力文献阅读效率提升50倍

立即体验

用中文搜PubMed

大模型驱动的PubMed中文搜索引擎

马上搜索

推荐工具

医学文档翻译智能文献检索