文献检索文档翻译深度研究
Suppr Zotero 插件Zotero 插件
邀请有礼套餐&价格历史记录

新学期,新优惠

限时优惠:9月1日-9月22日

30天高级会员仅需29元

1天体验卡首发特惠仅需5.99元

了解详情
不再提醒
插件&应用
Suppr Zotero 插件Zotero 插件浏览器插件Mac 客户端Windows 客户端微信小程序
高级版
套餐订阅购买积分包
AI 工具
文献检索文档翻译深度研究
关于我们
关于 Suppr公司介绍联系我们用户协议隐私条款
关注我们

Suppr 超能文献

核心技术专利:CN118964589B侵权必究
粤ICP备2023148730 号-1Suppr @ 2025

自动化心律失常检测器中的年龄、性别和种族偏见。

Age, sex and race bias in automated arrhythmia detectors.

机构信息

Department of Biomedical Informatics, School of Medicine, Emory Uni versity, United States of America.

Department of Biomedical Informatics, School of Medicine, Emory Uni versity, United States of America.

出版信息

J Electrocardiol. 2022 Sep-Oct;74:5-9. doi: 10.1016/j.jelectrocard.2022.07.007. Epub 2022 Jul 18.


DOI:10.1016/j.jelectrocard.2022.07.007
PMID:35878534
原文链接:https://pmc.ncbi.nlm.nih.gov/articles/PMC11486543/
Abstract

Despite the recent explosion of machine learning applied to medical data, very few studies have examined algorithmic bias in any meaningful manner, comparing across algorithms, databases, and assessment metrics. In this study, we compared the biases in sex, age, and race of 56 algorithms on over 130,000 electrocardiograms (ECGs) using several metrics and propose a machine learning model design to reduce bias. Participants of the 2021 PhysioNet Challenge designed and implemented working, open-source algorithms to identify clinical diagnosis from 2- lead ECG recordings. We grouped the data from the training, validation, and test datasets by sex (male vs female), age (binned by decade), and race (Asian, Black, White, and Other) whenever possible. We computed recording-wise accuracy, area under the receiver operating characteristic curve (AUROC), area under the precision recall curve (AUPRC), F-measure, and the Challenge Score for each of the 56 algorithms. The Mann-Whitney U and the Kruskal-Wallis tests assessed the performance differences of algorithms across these demographic groups. Group trends revealed similar values for the AUROC, AUPRC, and F-measure for both male and female groups across the training, validation, and test sets. However, recording-wise accuracies were 20% higher (p < 0.01) and the Challenge Score 12% lower (p = 0.02) for female subjects on the test set. AUPRC, F-measure, and the Challenge Score increased with age, while recording-wise accuracy and AUROC decreased with age. The results were similar for the training and test sets, but only recording-wise accuracy (12% decrease per decade, p < 0.01), Challenge Score (1% increase per decade, p < 0.01), and AUROC (1% decrease per decade, p < 0.01) were statistically different on the test set. We observed similar AUROC, AUPRC, Challenge Score, and F-measure values across the different race categories. But, recording-wise accuracies were significantly lower for Black subjects and higher for Asian subjects on the training (31% difference, p < 0.01) and test (39% difference, p < 0.01) sets. A top performing model was then retrained using an additional constraint which simultaneously minimized differences in performance across sex, race and age. This resulted in a modest reduction in performance, with a significant reduction in bias. This work provides a demonstration that biases manifest as a function of model architecture, population, cost function and optimization metric, all of which should be closely examined in any model.

摘要

尽管最近机器学习在医学数据中的应用呈爆炸式增长,但很少有研究以有意义的方式比较算法偏差,比较算法、数据库和评估指标。在这项研究中,我们使用了几种指标,比较了 56 种算法在超过 130,000 份心电图 (ECG) 上的性别、年龄和种族偏差,并提出了一种机器学习模型设计来减少偏差。2021 年 PhysioNet 挑战赛的参与者设计并实施了工作的、开源算法,以从 2 导联心电图记录中识别临床诊断。我们尽可能地按性别 (男性与女性)、年龄 (按十年分组) 和种族 (亚洲人、黑人、白人、其他) 将训练、验证和测试数据集的数据分组。我们为 56 种算法中的每一种计算了记录准确性、接收者操作特征曲线下的面积 (AUROC)、精度-召回曲线下的面积 (AUPRC)、F-度量和挑战赛得分。Mann-Whitney U 和 Kruskal-Wallis 检验评估了这些人口统计学组中算法性能的差异。组趋势表明,在训练、验证和测试集上,男女组的 AUROC、AUPRC 和 F-度量值相似。然而,测试集上女性的记录准确率高 20% (p < 0.01),挑战赛得分低 12% (p = 0.02)。AUPRC、F-度量和挑战赛得分随年龄增长而增加,而记录准确率和 AUROC随年龄增长而降低。培训和测试集的结果相似,但仅记录准确率 (每十年降低 12%,p < 0.01)、挑战赛得分 (每十年增加 1%,p < 0.01) 和 AUROC (每十年降低 1%,p < 0.01) 在测试集上有统计学差异。我们在不同种族类别中观察到相似的 AUROC、AUPRC、挑战赛得分和 F-度量值。但是,在训练集和测试集上,黑人的记录准确率明显较低,而亚洲人的记录准确率较高(训练集 31%的差异,p < 0.01;测试集 39%的差异,p < 0.01)。然后,使用一个额外的约束重新训练一个表现良好的模型,该约束同时最小化了性能在性别、种族和年龄方面的差异。这导致性能略有下降,但偏差显著降低。这项工作证明了偏差表现为模型结构、人群、代价函数和优化指标的函数,所有这些都应该在任何模型中进行仔细检查。

相似文献

[1]
Age, sex and race bias in automated arrhythmia detectors.

J Electrocardiol. 2022

[2]
A Continuously Benchmarked and Crowdsourced Challenge for Rapid Development and Evaluation of Models to Predict COVID-19 Diagnosis and Hospitalization.

JAMA Netw Open. 2021-10-1

[3]
Establishment of noninvasive diabetes risk prediction model based on tongue features and machine learning techniques.

Int J Med Inform. 2021-5

[4]
Deep Learning of Electrocardiograms in Sinus Rhythm From US Veterans to Predict Atrial Fibrillation.

JAMA Cardiol. 2023-12-1

[5]
Pediatric ECG-Based Deep Learning to Predict Left Ventricular Dysfunction and Remodeling.

Circulation. 2024-3-19

[6]
Predicting Fetal Alcohol Spectrum Disorders Using Machine Learning Techniques: Multisite Retrospective Cohort Study.

J Med Internet Res. 2023-7-18

[7]
Predicting Choroidal Nevus Transformation to Melanoma Using Machine Learning.

Ophthalmol Sci. 2024-7-20

[8]
External Validation and Updating of a Statistical Civilian-Based Suicide Risk Model in US Naval Primary Care.

JAMA Netw Open. 2023-11-1

[9]
Detection of Left Ventricular Systolic Dysfunction From Electrocardiographic Images.

Circulation. 2023-8-29

[10]
Deep Learning for the Diagnosis of Stage in Retinopathy of Prematurity: Accuracy and Generalizability across Populations and Cameras.

Ophthalmol Retina. 2021-10

引用本文的文献

[1]
Bias Mitigation in Primary Health Care Artificial Intelligence Models: Scoping Review.

J Med Internet Res. 2025-1-7

[2]
Large Language Models for Wearable Sensor-Based Human Activity Recognition, Health Monitoring, and Behavioral Modeling: A Survey of Early Trends, Datasets, and Challenges.

Sensors (Basel). 2024-8-4

[3]
Evaluating and mitigating unfairness in multimodal remote mental health assessments.

PLOS Digit Health. 2024-7-24

[4]
Application of machine learning for lung cancer survival prognostication-A systematic review and meta-analysis.

Front Artif Intell. 2024-4-5

本文引用的文献

[1]
Dissecting racial bias in an algorithm used to manage the health of populations.

Science. 2019-10-25

[2]
Atrial fibrillation detection using single lead portable electrocardiographic monitoring: a systematic review and meta-analysis.

BMJ Open. 2018-9-17

[3]
Racial Differences in Electrocardiographic Characteristics and Prognostic Significance in Whites Versus Asians.

J Am Heart Assoc. 2016-3-25

[4]
Evidence of heterogeneity by race/ethnicity in genetic determinants of QT interval.

Epidemiology. 2014-11

[5]
How race becomes biology: embodiment of social inequality.

Am J Phys Anthropol. 2009-5

[6]
Science, ethnicity, and bias: where have we gone wrong?

Am Psychol. 1999-12

[7]
PhysioBank, PhysioToolkit, and PhysioNet: components of a new research resource for complex physiologic signals.

Circulation. 2000-6-13

文献AI研究员

20分钟写一篇综述,助力文献阅读效率提升50倍

立即体验

用中文搜PubMed

大模型驱动的PubMed中文搜索引擎

马上搜索

推荐工具

医学文档翻译智能文献检索