利用亚组可学习性检测、表征和减轻医疗保健数据集中的隐性和显性种族偏见：算法开发与验证研究

Detecting, Characterizing, and Mitigating Implicit and Explicit Racial Biases in Health Care Datasets With Subgroup Learnability: Algorithm Development and Validation Study.

作者信息

Gulamali Faris, Sawant Ashwin Shreekant, Liharska Lora, Horowitz Carol, Chan Lili, Hofer Ira, Singh Karandeep, Richardson Lynne, Mensah Emmanuel, Charney Alexander, Reich David, Hu Jianying, Nadkarni Girish

机构信息

Icahn School of Medicine at Mount Sinai, 1468 Madison Avenue, New York, NY, 10029, United States, 1 2122416500.

University of California, San Diego, San Diego, CA, United States.

出版信息

J Med Internet Res. 2025 Sep 4;27:e71757. doi: 10.2196/71757.

DOI:10.2196/71757

PMID:40905712

原文链接:https://pmc.ncbi.nlm.nih.gov/articles/PMC12410029/

Abstract

BACKGROUND

The growing adoption of diagnostic and prognostic algorithms in health care has led to concerns about the perpetuation of algorithmic bias against disadvantaged groups of individuals. Deep learning methods to detect and mitigate bias have revolved around modifying models, optimization strategies, and threshold calibration with varying levels of success and tradeoffs. However, there have been limited substantive efforts to address bias at the level of the data used to generate algorithms in health care datasets.

OBJECTIVE

The aim of this study is to create a simple metric (AEquity) that uses a learning curve approximation to distinguish and mitigate bias via guided dataset collection or relabeling.

METHODS

We demonstrate this metric in 2 well-known examples, chest X-rays and health care cost utilization, and detect novel biases in the National Health and Nutrition Examination Survey.

RESULTS

We demonstrated that using AEquity to guide data-centric collection for each diagnostic finding in the chest radiograph dataset decreased bias by between 29% and 96.5% when measured by differences in area under the curve. Next, we wanted to examine (1) whether AEquity worked on intersectional populations and (2) if AEquity is invariant to different types of fairness metrics, not just area under the curve. Subsequently, we examined the effect of AEquity on mitigating bias when measured by false negative rate, precision, and false discovery rate for Black patients on Medicaid. When we examined Black patients on Medicaid, at the intersection of race and socioeconomic status, we found that AEquity-based interventions reduced bias across a number of different fairness metrics including overall false negative rate by 33.3% (bias reduction absolute=1.88×10-1, 95% CI 1.4×10-1 to 2.5×10-1; bias reduction of 33.3%, 95% CI 26.6%-40%; precision bias by 7.50×10-2, 95% CI 7.48×10-2 to 7.51×10-2; bias reduction of 94.6%, 95% CI 94.5%-94.7%; false discovery rate by 94.5%; absolute bias reduction=3.50×10-2, 95% CI 3.49×10-2 to 3.50×10-2). Similarly, AEquity-guided data collection demonstrated bias reduction of up to 80% on mortality prediction with the National Health and Nutrition Examination Survey (bias reduction absolute=0.08, 95% CI 0.07-0.09). Then, we wanted to compare AEquity to state-of-the-art data-guided debiasing measures such as balanced empirical risk minimization and calibration. Consequently, we benchmarked against balanced empirical risk minimization and calibration and showed that AEquity-guided data collection outperforms both standard approaches. Moreover, we demonstrated that AEquity works on fully connected networks; convolutional neural networks such as ResNet-50; transformer architectures such as VIT-B-16, a vision transformer with 86 million parameters; and nonparametric methods such as Light Gradient-Boosting Machine.

CONCLUSIONS

In short, we demonstrated that AEquity is a robust tool by applying it to different datasets, algorithms, and intersectional analyses and measuring its effectiveness with respect to a range of traditional fairness metrics.

摘要

背景

医疗保健领域中诊断和预后算法的日益采用引发了人们对算法对弱势群体的偏见持续存在的担忧。用于检测和减轻偏见的深度学习方法主要围绕修改模型、优化策略和阈值校准展开，取得了不同程度的成功并存在权衡。然而，在用于生成医疗保健数据集算法的数据层面，解决偏见的实质性努力有限。

目的

本研究的目的是创建一个简单的指标（AEquity），该指标使用学习曲线近似法，通过有指导的数据集收集或重新标记来区分和减轻偏见。

方法

我们在胸部X射线和医疗保健成本利用这两个著名的例子中展示了这个指标，并在国家健康与营养检查调查中检测到了新的偏见。

结果

我们证明，在胸部X光片数据集中，使用AEquity指导以数据为中心的每个诊断结果的收集，通过曲线下面积差异衡量，偏见降低了29%至96.5%。接下来，我们想研究（1）AEquity是否适用于交叉群体，以及（2）AEquity是否对不同类型的公平性指标不变，而不仅仅是曲线下面积。随后，我们研究了以医疗补助计划中的黑人患者的假阴性率、精度和错误发现率衡量时，AEquity对减轻偏见的影响。当我们研究医疗补助计划中的黑人患者，在种族和社会经济地位交叉点时，我们发现基于AEquity的干预措施在许多不同的公平性指标上减少了偏见，包括总体假阴性率降低了33.3%（偏见减少绝对值 = 1.88×10 - 1，95%置信区间1.4×10 - 1至2.5×10 - 1；偏见减少33.3%，95%置信区间26.6% - 40%）；精度偏差降低了7.50×10 - 2，95%置信区间7.48×10 - 2至7.51×10 - 2；偏见减少94.6%，95%置信区间94.5% - 94.7%；错误发现率降低了94.5%；绝对偏见减少 = 3.50×10 - 2，95%置信区间3.49×10 - 2至3.50×10 - 2）。同样，在国家健康与营养检查调查中，AEquity指导的数据收集在死亡率预测方面显示偏见减少高达80%（偏见减少绝对值 = 0.08，95%置信区间0.07 - 0.09）。然后，我们想将AEquity与诸如平衡经验风险最小化和校准等先进的数据引导去偏措施进行比较。因此，我们以平衡经验风险最小化和校准为基准，表明AEquity指导的数据收集优于这两种标准方法。此外，我们证明AEquity适用于全连接网络；卷积神经网络，如ResNet - 50；变压器架构，如VIT - B - 16，一个有8600万个参数的视觉变压器；以及非参数方法，如轻梯度提升机。

结论

简而言之，我们通过将AEquity应用于不同的数据集、算法和交叉分析，并就一系列传统公平性指标衡量其有效性，证明了AEquity是一个强大的工具。

https://cdn.ncbi.nlm.nih.gov/pmc/blobs/1a2f/12410029/0af9e8e68f87/jmir-v27-e71757-g001.jpg

相似文献

Detecting, Characterizing, and Mitigating Implicit and Explicit Racial Biases in Health Care Datasets With Subgroup Learnability: Algorithm Development and Validation Study.

J Med Internet Res. 2025 Sep 4;27:e71757. doi: 10.2196/71757.

Prescription of Controlled Substances: Benefits and Risks

Artificial intelligence for diagnosing exudative age-related macular degeneration.

Cochrane Database Syst Rev. 2024 Oct 17;10(10):CD015522. doi: 10.1002/14651858.CD015522.pub2.

Comparison of Two Modern Survival Prediction Tools, SORG-MLA and METSSS, in Patients With Symptomatic Long-bone Metastases Who Underwent Local Treatment With Surgery Followed by Radiotherapy and With Radiotherapy Alone.

Clin Orthop Relat Res. 2024 Dec 1;482(12):2193-2208. doi: 10.1097/CORR.0000000000003185. Epub 2024 Jul 23.

Signs and symptoms to determine if a patient presenting in primary care or hospital outpatient settings has COVID-19.

Cochrane Database Syst Rev. 2022 May 20;5(5):CD013665. doi: 10.1002/14651858.CD013665.pub3.

Diagnostic test accuracy and cost-effectiveness of tests for codeletion of chromosomal arms 1p and 19q in people with glioma.

Cochrane Database Syst Rev. 2022 Mar 2;3(3):CD013387. doi: 10.1002/14651858.CD013387.pub2.

Incentives for preventing smoking in children and adolescents.

Cochrane Database Syst Rev. 2017 Jun 6;6(6):CD008645. doi: 10.1002/14651858.CD008645.pub3.

A New Measure of Quantified Social Health Is Associated With Levels of Discomfort, Capability, and Mental and General Health Among Patients Seeking Musculoskeletal Specialty Care.

Clin Orthop Relat Res. 2025 Apr 1;483(4):647-663. doi: 10.1097/CORR.0000000000003394. Epub 2025 Feb 5.

Drugs for preventing postoperative nausea and vomiting in adults after general anaesthesia: a network meta-analysis.

Cochrane Database Syst Rev. 2020 Oct 19;10(10):CD012859. doi: 10.1002/14651858.CD012859.pub2.

Interventions for preventing falls in older people in care facilities.

Cochrane Database Syst Rev. 2025 Aug 20;8:CD016064. doi: 10.1002/14651858.CD016064.

本文引用的文献

AI models collapse when trained on recursively generated data.

Nature. 2024 Jul;631(8022):755-759. doi: 10.1038/s41586-024-07566-y. Epub 2024 Jul 24.

Generative models improve fairness of medical classifiers under distribution shifts.

Nat Med. 2024 Apr;30(4):1166-1173. doi: 10.1038/s41591-024-02838-6. Epub 2024 Apr 10.

Algorithmic fairness in artificial intelligence for medicine and healthcare.

Nat Biomed Eng. 2023 Jun;7(6):719-742. doi: 10.1038/s41551-023-01056-8. Epub 2023 Jun 28.

Autoencoders for sample size estimation for fully connected neural network classifiers.

NPJ Digit Med. 2022 Dec 13;5(1):180. doi: 10.1038/s41746-022-00728-0.

Development of an interpretable machine learning model associated with heavy metals' exposure to identify coronary heart disease among US adults via SHAP: Findings of the US NHANES from 2003 to 2018.

Chemosphere. 2023 Jan;311(Pt 1):137039. doi: 10.1016/j.chemosphere.2022.137039. Epub 2022 Oct 29.

Individualising intensive systolic blood pressure reduction in hypertension using computational trial phenomaps and machine learning: a post-hoc analysis of randomised clinical trials.

Lancet Digit Health. 2022 Nov;4(11):e796-e805. doi: 10.1016/S2589-7500(22)00170-4.

RadImageNet: An Open Radiologic Deep Learning Research Dataset for Effective Transfer Learning.

Radiol Artif Intell. 2022 Jul 27;4(5):e210315. doi: 10.1148/ryai.210315. eCollection 2022 Sep.

Interpretable machine learning prediction of all-cause mortality.

Commun Med (Lond). 2022 Oct 3;2:125. doi: 10.1038/s43856-022-00180-x. eCollection 2022.

Opportunities to Improve Long COVID Care: Implications from Semi-structured Interviews with Black Patients.

Patient. 2022 Nov;15(6):715-728. doi: 10.1007/s40271-022-00594-8. Epub 2022 Jul 30.

Reply to: 'Potential sources of dataset bias complicate investigation of underdiagnosis by machine learning algorithms' and 'Confounding factors need to be accounted for in assessing bias by machine learning algorithms'.

Nat Med. 2022 Jun;28(6):1161-1162. doi: 10.1038/s41591-022-01854-8. Epub 2022 Jun 16.

文献AI研究员

20分钟写一篇综述，助力文献阅读效率提升50倍。

立即体验

用中文搜PubMed

大模型驱动的PubMed中文搜索引擎

马上搜索

文档翻译

学术文献翻译模型，支持多种主流文档格式。

立即体验

利用亚组可学习性检测、表征和减轻医疗保健数据集中的隐性和显性种族偏见：算法开发与验证研究

Detecting, Characterizing, and Mitigating Implicit and Explicit Racial Biases in Health Care Datasets With Subgroup Learnability: Algorithm Development and Validation Study.

作者信息

机构信息

出版信息

BACKGROUND

OBJECTIVE

METHODS

RESULTS

CONCLUSIONS

背景

目的

方法

结果

结论

相似文献

本文引用的文献

文献AI研究员

用中文搜PubMed

文档翻译

Suppr 超能文献

相似文献

本文引用的文献