College of Art and Science, Vanderbilt University, Nashville, TN, USA.
Department of Biomedical Informatics, Vanderbilt University Medical Center, Nashville, TN, USA.
J Biomed Inform. 2023 Feb;138:104294. doi: 10.1016/j.jbi.2023.104294. Epub 2023 Jan 24.
OBJECTIVE: The study aims to investigate whether machine learning-based predictive models for cardiovascular disease (CVD) risk assessment show equivalent performance across demographic groups (such as race and gender) and if bias mitigation methods can reduce any bias present in the models. This is important as systematic bias may be introduced when collecting and preprocessing health data, which could affect the performance of the models on certain demographic sub-cohorts. The study is to investigate this using electronic health records data and various machine learning models. METHODS: The study used large de-identified Electronic Health Records data from Vanderbilt University Medical Center. Machine learning (ML) algorithms including logistic regression, random forest, gradient-boosting trees, and long short-term memory were applied to build multiple predictive models. Model bias and fairness were evaluated using equal opportunity difference (EOD, 0 indicates fairness) and disparate impact (DI, 1 indicates fairness). In our study, we also evaluated the fairness of a non-ML baseline model, the American Heart Association (AHA) Pooled Cohort Risk Equations (PCEs). Moreover, we compared the performance of three different de-biasing methods: removing protected attributes (e.g., race and gender), resampling the imbalanced training dataset by sample size, and resampling by the proportion of people with CVD outcomes. RESULTS: The study cohort included 109,490 individuals (mean [SD] age 47.4 [14.7] years; 64.5% female; 86.3% White; 13.7% Black). The experimental results suggested that most ML models had smaller EOD and DI than PCEs. For ML models, the mean EOD ranged from -0.001 to 0.018 and the mean DI ranged from 1.037 to 1.094 across race groups. There was a larger EOD and DI across gender groups, with EOD ranging from 0.131 to 0.136 and DI ranging from 1.535 to 1.587. For debiasing methods, removing protected attributes didn't significantly reduced the bias for most ML models. Resampling by sample size also didn't consistently decrease bias. Resampling by case proportion reduced the EOD and DI for gender groups but slightly reduced accuracy in many cases. CONCLUSIONS: Among the VUMC cohort, both PCEs and ML models were biased against women, suggesting the need to investigate and correct gender disparities in CVD risk prediction. Resampling by proportion reduced the bias for gender groups but not for race groups.
目的:本研究旨在探讨基于机器学习的心血管疾病(CVD)风险评估预测模型在不同人群(如种族和性别)中的表现是否相同,以及减轻偏差的方法是否可以减少模型中的偏差。这一点很重要,因为在收集和预处理健康数据时可能会引入系统性偏差,这可能会影响模型在某些特定人群亚组中的表现。本研究使用电子健康记录数据和各种机器学习模型来研究这一问题。
方法:本研究使用了来自范德比尔特大学医学中心的大型去识别电子健康记录数据。应用机器学习(ML)算法,包括逻辑回归、随机森林、梯度提升树和长短时记忆,构建了多个预测模型。使用均等机会差(EOD,0 表示公平)和差异影响(DI,1 表示公平)来评估模型的偏差和公平性。在我们的研究中,我们还评估了非 ML 基线模型,即美国心脏协会(AHA) pooled Cohort Risk Equations(PCEs)的公平性。此外,我们比较了三种不同去偏方法的性能:去除保护属性(如种族和性别)、按样本量对不平衡训练数据集进行重采样以及按 CVD 结果人群的比例进行重采样。
结果:研究队列包括 109490 名个体(平均[SD]年龄 47.4[14.7]岁;64.5%为女性;86.3%为白人;13.7%为黑人)。实验结果表明,大多数 ML 模型的 EOD 和 DI 均小于 PCEs。对于 ML 模型,EOD 的平均值范围为-0.001 至 0.018,DI 的平均值范围为 1.037 至 1.094,跨越种族群体。性别群体的 EOD 和 DI 更大,EOD 范围为 0.131 至 0.136,DI 范围为 1.535 至 1.587。对于去偏方法,去除保护属性并不能显著降低大多数 ML 模型的偏差。按样本量重采样也不能一致地降低偏差。按病例比例重采样降低了性别组的 EOD 和 DI,但在许多情况下会略微降低准确性。
结论:在 VUMC 队列中,PCEs 和 ML 模型都对女性存在偏差,这表明需要调查和纠正 CVD 风险预测中的性别差异。按比例重采样可以降低性别组的偏差,但不能降低种族组的偏差。
JAMA Netw Open. 2021-4-1
JAMA Netw Open. 2024-7-1
Cochrane Database Syst Rev. 2022-2-1
JAMA Netw Open. 2024-12-2
JMIR Form Res. 2022-6-14
Health Care Manag Sci. 2024-12
JAMA Netw Open. 2023-11-1
Front Digit Health. 2025-6-19
Clin Chem Lab Med. 2025-5-28
AMIA Annu Symp Proc. 2025-5-22
Diagnostics (Basel). 2024-11-27
Circ Res. 2022-2-18
Front Artif Intell. 2021-4-15
JAMA Netw Open. 2021-4-1
Lancet Digit Health. 2020-5
Sci Rep. 2020-9-29
Clin Transl Sci. 2021-1
N Engl J Med. 2020-8-27