评估和减轻心血管疾病预测机器学习模型中的偏差。

Evaluating and mitigating bias in machine learning models for cardiovascular disease prediction.

机构信息

College of Art and Science, Vanderbilt University, Nashville, TN, USA.

Department of Biomedical Informatics, Vanderbilt University Medical Center, Nashville, TN, USA.

出版信息

J Biomed Inform. 2023 Feb;138:104294. doi: 10.1016/j.jbi.2023.104294. Epub 2023 Jan 24.

DOI:10.1016/j.jbi.2023.104294

PMID:36706849

原文链接:https://pmc.ncbi.nlm.nih.gov/articles/PMC11104322/

Abstract

OBJECTIVE

The study aims to investigate whether machine learning-based predictive models for cardiovascular disease (CVD) risk assessment show equivalent performance across demographic groups (such as race and gender) and if bias mitigation methods can reduce any bias present in the models. This is important as systematic bias may be introduced when collecting and preprocessing health data, which could affect the performance of the models on certain demographic sub-cohorts. The study is to investigate this using electronic health records data and various machine learning models.

METHODS

The study used large de-identified Electronic Health Records data from Vanderbilt University Medical Center. Machine learning (ML) algorithms including logistic regression, random forest, gradient-boosting trees, and long short-term memory were applied to build multiple predictive models. Model bias and fairness were evaluated using equal opportunity difference (EOD, 0 indicates fairness) and disparate impact (DI, 1 indicates fairness). In our study, we also evaluated the fairness of a non-ML baseline model, the American Heart Association (AHA) Pooled Cohort Risk Equations (PCEs). Moreover, we compared the performance of three different de-biasing methods: removing protected attributes (e.g., race and gender), resampling the imbalanced training dataset by sample size, and resampling by the proportion of people with CVD outcomes.

RESULTS

The study cohort included 109,490 individuals (mean [SD] age 47.4 [14.7] years; 64.5% female; 86.3% White; 13.7% Black). The experimental results suggested that most ML models had smaller EOD and DI than PCEs. For ML models, the mean EOD ranged from -0.001 to 0.018 and the mean DI ranged from 1.037 to 1.094 across race groups. There was a larger EOD and DI across gender groups, with EOD ranging from 0.131 to 0.136 and DI ranging from 1.535 to 1.587. For debiasing methods, removing protected attributes didn't significantly reduced the bias for most ML models. Resampling by sample size also didn't consistently decrease bias. Resampling by case proportion reduced the EOD and DI for gender groups but slightly reduced accuracy in many cases.

CONCLUSIONS

Among the VUMC cohort, both PCEs and ML models were biased against women, suggesting the need to investigate and correct gender disparities in CVD risk prediction. Resampling by proportion reduced the bias for gender groups but not for race groups.

摘要

目的

本研究旨在探讨基于机器学习的心血管疾病（CVD）风险评估预测模型在不同人群（如种族和性别）中的表现是否相同，以及减轻偏差的方法是否可以减少模型中的偏差。这一点很重要，因为在收集和预处理健康数据时可能会引入系统性偏差，这可能会影响模型在某些特定人群亚组中的表现。本研究使用电子健康记录数据和各种机器学习模型来研究这一问题。

方法

本研究使用了来自范德比尔特大学医学中心的大型去识别电子健康记录数据。应用机器学习（ML）算法，包括逻辑回归、随机森林、梯度提升树和长短时记忆，构建了多个预测模型。使用均等机会差（EOD，0 表示公平）和差异影响（DI，1 表示公平）来评估模型的偏差和公平性。在我们的研究中，我们还评估了非 ML 基线模型，即美国心脏协会（AHA） pooled Cohort Risk Equations（PCEs）的公平性。此外，我们比较了三种不同去偏方法的性能：去除保护属性（如种族和性别）、按样本量对不平衡训练数据集进行重采样以及按 CVD 结果人群的比例进行重采样。

结果

研究队列包括 109490 名个体（平均[SD]年龄 47.4[14.7]岁；64.5%为女性；86.3%为白人；13.7%为黑人）。实验结果表明，大多数 ML 模型的 EOD 和 DI 均小于 PCEs。对于 ML 模型，EOD 的平均值范围为-0.001 至 0.018，DI 的平均值范围为 1.037 至 1.094，跨越种族群体。性别群体的 EOD 和 DI 更大，EOD 范围为 0.131 至 0.136，DI 范围为 1.535 至 1.587。对于去偏方法，去除保护属性并不能显著降低大多数 ML 模型的偏差。按样本量重采样也不能一致地降低偏差。按病例比例重采样降低了性别组的 EOD 和 DI，但在许多情况下会略微降低准确性。

结论

在 VUMC 队列中，PCEs 和 ML 模型都对女性存在偏差，这表明需要调查和纠正 CVD 风险预测中的性别差异。按比例重采样可以降低性别组的偏差，但不能降低种族组的偏差。

相似文献

Evaluating and mitigating bias in machine learning models for cardiovascular disease prediction.

J Biomed Inform. 2023 Feb;138:104294. doi: 10.1016/j.jbi.2023.104294. Epub 2023 Jan 24.

Comparison of Methods to Reduce Bias From Clinical Prediction Models of Postpartum Depression.

JAMA Netw Open. 2021 Apr 1;4(4):e213909. doi: 10.1001/jamanetworkopen.2021.3909.

Fairness in Predicting Cancer Mortality Across Racial Subgroups.

JAMA Netw Open. 2024 Jul 1;7(7):e2421290. doi: 10.1001/jamanetworkopen.2024.21290.

Predictive Accuracy of Stroke Risk Prediction Models Across Black and White Race, Sex, and Age Groups.

JAMA. 2023 Jan 24;329(4):306-317. doi: 10.1001/jama.2022.24683.

Assessing fairness in machine learning models: A study of racial bias using matched counterparts in mortality prediction for patients with chronic diseases.

J Biomed Inform. 2024 Aug;156:104677. doi: 10.1016/j.jbi.2024.104677. Epub 2024 Jun 13.

Folic acid supplementation and malaria susceptibility and severity among people taking antifolate antimalarial drugs in endemic areas.

Cochrane Database Syst Rev. 2022 Feb 1;2(2022):CD014217. doi: 10.1002/14651858.CD014217.

Evaluating Bias-Mitigated Predictive Models of Perinatal Mood and Anxiety Disorders.

JAMA Netw Open. 2024 Dec 2;7(12):e2438152. doi: 10.1001/jamanetworkopen.2024.38152.

Fairness in Mobile Phone-Based Mental Health Assessment Algorithms: Exploratory Study.

JMIR Form Res. 2022 Jun 14;6(6):e34366. doi: 10.2196/34366.

Evaluating machine learning model bias and racial disparities in non-small cell lung cancer using SEER registry data.

Health Care Manag Sci. 2024 Dec;27(4):631-649. doi: 10.1007/s10729-024-09691-6. Epub 2024 Nov 4.

Algorithmic Fairness of Machine Learning Models for Alzheimer Disease Progression.

JAMA Netw Open. 2023 Nov 1;6(11):e2342203. doi: 10.1001/jamanetworkopen.2023.42203.

引用本文的文献

Enhancing heart disease prediction with stacked ensemble and MCDM-based ranking: an optimized RST-ML approach.

Front Digit Health. 2025 Jun 19;7:1609308. doi: 10.3389/fdgth.2025.1609308. eCollection 2025.

Computational strategic recruitment for representation and coverage studied in the All of Us Research Program.

NPJ Digit Med. 2025 Jul 3;8(1):402. doi: 10.1038/s41746-025-01804-x.

Impact of analytical bias on machine learning models for sepsis prediction using laboratory data.

Clin Chem Lab Med. 2025 May 28. doi: 10.1515/cclm-2025-0491.

Enhancement of Fairness in AI for Chest X-ray Classification.

AMIA Annu Symp Proc. 2025 May 22;2024:551-560. eCollection 2024.

Toward Identifying New Risk Aversions and Subsequent Limitations and Biases When Making De-identified Structured Data Sets Openly Available in a Post-LLM world.

AMIA Annu Symp Proc. 2025 May 22;2024:262-270. eCollection 2024.

Machine learning based prediction models for cardiovascular disease risk using electronic health records data: systematic review and meta-analysis.

Eur Heart J Digit Health. 2024 Oct 27;6(1):7-22. doi: 10.1093/ehjdh/ztae080. eCollection 2025 Jan.

Mitigating Algorithmic Bias in AI-Driven Cardiovascular Imaging for Fairer Diagnostics.

Diagnostics (Basel). 2024 Nov 27;14(23):2675. doi: 10.3390/diagnostics14232675.

Evaluating and Reducing Subgroup Disparity in AI Models: An Analysis of Pediatric COVID-19 Test Outcomes.

medRxiv. 2024 Sep 19:2024.09.18.24313889. doi: 10.1101/2024.09.18.24313889.

Mitigating Sociodemographic Bias in Opioid Use Disorder Prediction: Fairness-Aware Machine Learning Framework.

JMIR AI. 2024 Aug 20;3:e55820. doi: 10.2196/55820.

Large Language Models for Wearable Sensor-Based Human Activity Recognition, Health Monitoring, and Behavioral Modeling: A Survey of Early Trends, Datasets, and Challenges.

Sensors (Basel). 2024 Aug 4;24(15):5045. doi: 10.3390/s24155045.

本文引用的文献

Subpopulation-specific machine learning prognosis for underrepresented patients with double prioritized bias correction.

Commun Med (Lond). 2022 Sep 1;2:111. doi: 10.1038/s43856-022-00165-w. eCollection 2022.

Pregnancy and Reproductive Risk Factors for Cardiovascular Disease in Women.

Circ Res. 2022 Feb 18;130(4):652-672. doi: 10.1161/CIRCRESAHA.121.319895. Epub 2022 Feb 17.

Addressing Fairness, Bias, and Appropriate Use of Artificial Intelligence and Machine Learning in Global Health.

Front Artif Intell. 2021 Apr 15;3:561802. doi: 10.3389/frai.2020.561802. eCollection 2020.

Comparison of Methods to Reduce Bias From Clinical Prediction Models of Postpartum Depression.

JAMA Netw Open. 2021 Apr 1;4(4):e213909. doi: 10.1001/jamanetworkopen.2021.3909.

Heart Disease and Stroke Statistics-2021 Update: A Report From the American Heart Association.

Circulation. 2021 Feb 23;143(8):e254-e743. doi: 10.1161/CIR.0000000000000950. Epub 2021 Jan 27.

Ethical limitations of algorithmic fairness solutions in health care machine learning.

Lancet Digit Health. 2020 May;2(5):e221-e223. doi: 10.1016/S2589-7500(20)30065-0.

Machine learning prediction in cardiovascular diseases: a meta-analysis.

Sci Rep. 2020 Sep 29;10(1):16057. doi: 10.1038/s41598-020-72685-1.

Precision Medicine, AI, and the Future of Personalized Health Care.

Clin Transl Sci. 2021 Jan;14(1):86-93. doi: 10.1111/cts.12884. Epub 2020 Oct 12.

Hidden in Plain Sight - Reconsidering the Use of Race Correction in Clinical Algorithms.

N Engl J Med. 2020 Aug 27;383(9):874-882. doi: 10.1056/NEJMms2004740. Epub 2020 Jun 17.

Predictably unequal: understanding and addressing concerns that algorithmic clinical prediction may increase health disparities.

NPJ Digit Med. 2020 Jul 30;3:99. doi: 10.1038/s41746-020-0304-9. eCollection 2020.

文献AI研究员

20分钟写一篇综述，助力文献阅读效率提升50倍。

立即体验

用中文搜PubMed

大模型驱动的PubMed中文搜索引擎

马上搜索

文档翻译

学术文献翻译模型，支持多种主流文档格式。

立即体验

评估和减轻心血管疾病预测机器学习模型中的偏差。

Evaluating and mitigating bias in machine learning models for cardiovascular disease prediction.

机构信息

College of Art and Science, Vanderbilt University, Nashville, TN, USA.

Department of Biomedical Informatics, Vanderbilt University Medical Center, Nashville, TN, USA.

出版信息

J Biomed Inform. 2023 Feb;138:104294. doi: 10.1016/j.jbi.2023.104294. Epub 2023 Jan 24.

DOI:10.1016/j.jbi.2023.104294

PMID:36706849

原文链接:https://pmc.ncbi.nlm.nih.gov/articles/PMC11104322/

Abstract

OBJECTIVE

METHODS

RESULTS

CONCLUSIONS

摘要

评估和减轻心血管疾病预测机器学习模型中的偏差。

Evaluating and mitigating bias in machine learning models for cardiovascular disease prediction.

机构信息

出版信息

OBJECTIVE

METHODS

RESULTS

CONCLUSIONS

目的

方法

结果

结论

相似文献

引用本文的文献

本文引用的文献

文献AI研究员

用中文搜PubMed

文档翻译

Suppr 超能文献

评估和减轻心血管疾病预测机器学习模型中的偏差。

Evaluating and mitigating bias in machine learning models for cardiovascular disease prediction.

机构信息

出版信息

OBJECTIVE

METHODS

RESULTS

CONCLUSIONS

目的

方法

结果

结论