Suppr超能文献

基于性别的机器学习算法在心脏疾病预测中的表现差异:探索性研究。

Sex-Based Performance Disparities in Machine Learning Algorithms for Cardiac Disease Prediction: Exploratory Study.

机构信息

University College London, London, United Kingdom.

出版信息

J Med Internet Res. 2024 Aug 26;26:e46936. doi: 10.2196/46936.

Abstract

BACKGROUND

The presence of bias in artificial intelligence has garnered increased attention, with inequities in algorithmic performance being exposed across the fields of criminal justice, education, and welfare services. In health care, the inequitable performance of algorithms across demographic groups may widen health inequalities.

OBJECTIVE

Here, we identify and characterize bias in cardiology algorithms, looking specifically at algorithms used in the management of heart failure.

METHODS

Stage 1 involved a literature search of PubMed and Web of Science for key terms relating to cardiac machine learning (ML) algorithms. Papers that built ML models to predict cardiac disease were evaluated for their focus on demographic bias in model performance, and open-source data sets were retained for our investigation. Two open-source data sets were identified: (1) the University of California Irvine Heart Failure data set and (2) the University of California Irvine Coronary Artery Disease data set. We reproduced existing algorithms that have been reported for these data sets, tested them for sex biases in algorithm performance, and assessed a range of remediation techniques for their efficacy in reducing inequities. Particular attention was paid to the false negative rate (FNR), due to the clinical significance of underdiagnosis and missed opportunities for treatment.

RESULTS

In stage 1, our literature search returned 127 papers, with 60 meeting the criteria for a full review and only 3 papers highlighting sex differences in algorithm performance. In the papers that reported sex, there was a consistent underrepresentation of female patients in the data sets. No papers investigated racial or ethnic differences. In stage 2, we reproduced algorithms reported in the literature, achieving mean accuracies of 84.24% (SD 3.51%) for data set 1 and 85.72% (SD 1.75%) for data set 2 (random forest models). For data set 1, the FNR was significantly higher for female patients in 13 out of 16 experiments, meeting the threshold of statistical significance (-17.81% to -3.37%; P<.05). A smaller disparity in the false positive rate was significant for male patients in 13 out of 16 experiments (-0.48% to +9.77%; P<.05). We observed an overprediction of disease for male patients (higher false positive rate) and an underprediction of disease for female patients (higher FNR). Sex differences in feature importance suggest that feature selection needs to be demographically tailored.

CONCLUSIONS

Our research exposes a significant gap in cardiac ML research, highlighting that the underperformance of algorithms for female patients has been overlooked in the published literature. Our study quantifies sex disparities in algorithmic performance and explores several sources of bias. We found an underrepresentation of female patients in the data sets used to train algorithms, identified sex biases in model error rates, and demonstrated that a series of remediation techniques were unable to address the inequities present.

摘要

背景

人工智能中的偏见问题引起了越来越多的关注,算法在刑事司法、教育和福利服务等领域的表现不公平问题已经暴露出来。在医疗保健领域,算法在不同人群中的不公平表现可能会扩大健康不平等。

目的

本研究旨在识别和描述心脏病学算法中的偏差,并特别关注用于心力衰竭管理的算法。

方法

第 1 阶段通过在 PubMed 和 Web of Science 上搜索与心脏机器学习(ML)算法相关的关键词,进行了文献检索。评估了构建用于预测心脏疾病的 ML 模型的论文,以了解模型性能中的人口统计学偏差,并保留了开源数据集以进行我们的研究。确定了两个开源数据集:(1)加利福尼亚大学欧文分校心力衰竭数据集和(2)加利福尼亚大学欧文分校冠状动脉疾病数据集。我们重现了已报告用于这些数据集的现有算法,测试了它们在算法性能方面的性别偏差,并评估了一系列纠正技术在减少不公平方面的效果。特别关注假阴性率(FNR),因为漏诊和错失治疗机会的临床意义重大。

结果

在第 1 阶段,我们的文献检索返回了 127 篇论文,其中 60 篇符合全文审查标准,只有 3 篇论文强调了算法性能中的性别差异。在报告性别差异的论文中,数据集中女性患者的代表性不足。没有论文调查种族或民族差异。在第 2 阶段,我们重现了文献中报告的算法,在数据集中 1 的平均准确率为 84.24%(SD 3.51%),在数据集中 2 的平均准确率为 85.72%(SD 1.75%)(随机森林模型)。对于数据集 1,在 16 项实验中的 13 项中,女性患者的 FNR 明显更高,达到统计学意义(-17.81%至-3.37%;P<.05)。在 16 项实验中的 13 项中,男性患者的假阳性率差异较小(-0.48%至+9.77%;P<.05)。我们观察到男性患者的疾病预测过高(假阳性率较高),而女性患者的疾病预测过低(FNR 较高)。特征重要性的性别差异表明,特征选择需要针对人口统计学进行调整。

结论

我们的研究揭示了心脏 ML 研究中的一个重大差距,突出表明算法对女性患者的表现不佳在已发表的文献中被忽视了。我们的研究量化了算法性能中的性别差距,并探讨了几种来源的偏差。我们发现,用于训练算法的数据集中女性患者的代表性不足,模型错误率存在性别偏差,并证明一系列纠正技术无法解决存在的不公平问题。

https://cdn.ncbi.nlm.nih.gov/pmc/blobs/095f/11384168/a2956bcdd8a5/jmir_v26i1e46936_fig1.jpg

文献AI研究员

20分钟写一篇综述,助力文献阅读效率提升50倍。

立即体验

用中文搜PubMed

大模型驱动的PubMed中文搜索引擎

马上搜索

文档翻译

学术文献翻译模型,支持多种主流文档格式。

立即体验