文献检索文档翻译深度研究
Suppr Zotero 插件Zotero 插件
邀请有礼套餐&价格历史记录

新学期,新优惠

限时优惠:9月1日-9月22日

30天高级会员仅需29元

1天体验卡首发特惠仅需5.99元

了解详情
不再提醒
插件&应用
Suppr Zotero 插件Zotero 插件浏览器插件Mac 客户端Windows 客户端微信小程序
高级版
套餐订阅购买积分包
AI 工具
文献检索文档翻译深度研究
关于我们
关于 Suppr公司介绍联系我们用户协议隐私条款
关注我们

Suppr 超能文献

核心技术专利:CN118964589B侵权必究
粤ICP备2023148730 号-1Suppr @ 2025

揭示并量化大语言模型在医学报告生成中的种族偏见。

Unmasking and quantifying racial bias of large language models in medical report generation.

作者信息

Yang Yifan, Liu Xiaoyu, Jin Qiao, Huang Furong, Lu Zhiyong

机构信息

National Institutes of Health (NIH), National Library of Medicine (NLM), National Center for Biotechnology Information (NCBI), Bethesda, MD, 20894, USA.

University of Maryland at College Park, Department of Computer Science, College Park, MD, 20742, USA.

出版信息

Commun Med (Lond). 2024 Sep 10;4(1):176. doi: 10.1038/s43856-024-00601-z.


DOI:10.1038/s43856-024-00601-z
PMID:39256622
原文链接:https://pmc.ncbi.nlm.nih.gov/articles/PMC11387737/
Abstract

BACKGROUND: Large language models like GPT-3.5-turbo and GPT-4 hold promise for healthcare professionals, but they may inadvertently inherit biases during their training, potentially affecting their utility in medical applications. Despite few attempts in the past, the precise impact and extent of these biases remain uncertain. METHODS: We use LLMs to generate responses that predict hospitalization, cost and mortality based on real patient cases. We manually examine the generated responses to identify biases. RESULTS: We find that these models tend to project higher costs and longer hospitalizations for white populations and exhibit optimistic views in challenging medical scenarios with much higher survival rates. These biases, which mirror real-world healthcare disparities, are evident in the generation of patient backgrounds, the association of specific diseases with certain racial and ethnic groups, and disparities in treatment recommendations, etc. CONCLUSIONS: Our findings underscore the critical need for future research to address and mitigate biases in language models, especially in critical healthcare applications, to ensure fair and accurate outcomes for all patients.

摘要

背景:像GPT-3.5-turbo和GPT-4这样的大语言模型对医疗保健专业人员具有重要意义,但它们在训练过程中可能会无意中继承偏差,这可能会影响它们在医疗应用中的效用。尽管过去有过一些尝试,但这些偏差的确切影响和程度仍不确定。 方法:我们使用大语言模型根据真实患者病例生成预测住院、费用和死亡率的回复。我们人工检查生成的回复以识别偏差。 结果:我们发现,这些模型往往预测白人的费用更高、住院时间更长,并且在生存率高得多的具有挑战性的医疗场景中表现出乐观的看法。这些反映现实世界医疗保健差异的偏差在患者背景的生成、特定疾病与某些种族和族裔群体的关联以及治疗建议的差异等方面很明显。结论:我们的研究结果强调了未来研究的迫切需求,即解决和减轻语言模型中的偏差,特别是在关键的医疗保健应用中,以确保所有患者都能获得公平和准确的结果。

https://cdn.ncbi.nlm.nih.gov/pmc/blobs/2e75/11387737/ddd6939ff5c6/43856_2024_601_Fig2_HTML.jpg
https://cdn.ncbi.nlm.nih.gov/pmc/blobs/2e75/11387737/3e99f5d1c245/43856_2024_601_Fig1_HTML.jpg
https://cdn.ncbi.nlm.nih.gov/pmc/blobs/2e75/11387737/ddd6939ff5c6/43856_2024_601_Fig2_HTML.jpg
https://cdn.ncbi.nlm.nih.gov/pmc/blobs/2e75/11387737/3e99f5d1c245/43856_2024_601_Fig1_HTML.jpg
https://cdn.ncbi.nlm.nih.gov/pmc/blobs/2e75/11387737/ddd6939ff5c6/43856_2024_601_Fig2_HTML.jpg

相似文献

[1]
Unmasking and quantifying racial bias of large language models in medical report generation.

Commun Med (Lond). 2024-9-10

[2]
Unmasking and Quantifying Racial Bias of Large Language Models in Medical Report Generation.

ArXiv. 2024-1-25

[3]
Fairness in AI-Driven Oncology: Investigating Racial and Gender Biases in Large Language Models.

Cureus. 2024-9-16

[4]
Assessing the Alignment of Large Language Models With Human Values for Mental Health Integration: Cross-Sectional Study Using Schwartz's Theory of Basic Values.

JMIR Ment Health. 2024-4-9

[5]
Evaluating the Capabilities of Generative AI Tools in Understanding Medical Papers: Qualitative Study.

JMIR Med Inform. 2024-9-4

[6]
Assessing the potential of GPT-4 to perpetuate racial and gender biases in health care: a model evaluation study.

Lancet Digit Health. 2024-1

[7]
Using Large Language Models to Annotate Complex Cases of Social Determinants of Health in Longitudinal Clinical Records.

medRxiv. 2024-4-27

[8]
Implicit Bias

2025-1

[9]
Large Language Models Improve the Identification of Emergency Department Visits for Symptomatic Kidney Stones.

medRxiv. 2024-8-13

[10]
Evaluating Large Language Models for the National Premedical Exam in India: Comparative Analysis of GPT-3.5, GPT-4, and Bard.

JMIR Med Educ. 2024-2-21

引用本文的文献

[1]
Detecting Stigmatizing Language in Clinical Notes with Large Language Models for Addiction Care.

medRxiv. 2025-8-12

[2]
Harm Reduction Strategies for Thoughtful Use of Large Language Models in the Medical Domain: Perspectives for Patients and Clinicians.

J Med Internet Res. 2025-7-25

[3]
Digitalizing informed consent in healthcare: a scoping review.

BMC Health Serv Res. 2025-7-2

[4]
Evaluating artificial intelligence bias in nephrology: the role of diversity, equity, and inclusion in AI-driven decision-making and ethical regulation.

Front Artif Intell. 2025-5-27

[5]
Large Language Models in Medical Diagnostics: Scoping Review With Bibliometric Analysis.

J Med Internet Res. 2025-6-9

[6]
Racial bias in AI-mediated psychiatric diagnosis and treatment: a qualitative comparison of four large language models.

NPJ Digit Med. 2025-6-4

[7]
A scoping review on generative AI and large language models in mitigating medication related harm.

NPJ Digit Med. 2025-3-28

[8]
Red teaming ChatGPT in medicine to yield real-world insights on model behavior.

NPJ Digit Med. 2025-3-7

[9]
Implementing large language models in healthcare while balancing control, collaboration, costs and security.

NPJ Digit Med. 2025-3-6

[10]
Evaluating and addressing demographic disparities in medical large language models: a systematic review.

Int J Equity Health. 2025-2-26

本文引用的文献

[1]
Opportunities and challenges for ChatGPT and large language models in biomedicine and health.

Brief Bioinform. 2023-11-22

[2]
Assessing the potential of GPT-4 to perpetuate racial and gender biases in health care: a model evaluation study.

Lancet Digit Health. 2024-1

[3]
A large-scale dataset of patient summaries for retrieval-based clinical decision support systems.

Sci Data. 2023-12-18

[4]
Large language models propagate race-based medicine.

NPJ Digit Med. 2023-10-20

[5]
Evaluating GPT4 on Impressions Generation in Radiology Reports.

Radiology. 2023-6

[6]
Trends in Health Care Use Among Black and White Persons in the US, 1963-2019.

JAMA Netw Open. 2022-6-1

[7]
US Health Care Spending by Race and Ethnicity, 2002-2016.

JAMA. 2021-8-17

[8]
Guidelines To Writing A Clinical Case Report.

Heart Views. 2017

[9]
Health care expenditures among Asian American subgroups.

Med Care Res Rev. 2012-12-4

文献AI研究员

20分钟写一篇综述,助力文献阅读效率提升50倍

立即体验

用中文搜PubMed

大模型驱动的PubMed中文搜索引擎

马上搜索

推荐工具

医学文档翻译智能文献检索