文献检索文档翻译深度研究
Suppr Zotero 插件Zotero 插件
邀请有礼套餐&价格历史记录

新学期,新优惠

限时优惠:9月1日-9月22日

30天高级会员仅需29元

1天体验卡首发特惠仅需5.99元

了解详情
不再提醒
插件&应用
Suppr Zotero 插件Zotero 插件浏览器插件Mac 客户端Windows 客户端微信小程序
高级版
套餐订阅购买积分包
AI 工具
文献检索文档翻译深度研究
关于我们
关于 Suppr公司介绍联系我们用户协议隐私条款
关注我们

Suppr 超能文献

核心技术专利:CN118964589B侵权必究
粤ICP备2023148730 号-1Suppr @ 2025

大语言模型传播基于种族的医学观念。

Large language models propagate race-based medicine.

作者信息

Omiye Jesutofunmi A, Lester Jenna C, Spichak Simon, Rotemberg Veronica, Daneshjou Roxana

机构信息

Department of Dermatology, Stanford School of Medicine, Stanford, CA, USA.

Department of Biomedical Data Science, Stanford School of Medicine, Stanford, CA, USA.

出版信息

NPJ Digit Med. 2023 Oct 20;6(1):195. doi: 10.1038/s41746-023-00939-z.


DOI:10.1038/s41746-023-00939-z
PMID:37864012
原文链接:https://pmc.ncbi.nlm.nih.gov/articles/PMC10589311/
Abstract

Large language models (LLMs) are being integrated into healthcare systems; but these models may recapitulate harmful, race-based medicine. The objective of this study is to assess whether four commercially available large language models (LLMs) propagate harmful, inaccurate, race-based content when responding to eight different scenarios that check for race-based medicine or widespread misconceptions around race. Questions were derived from discussions among four physician experts and prior work on race-based medical misconceptions believed by medical trainees. We assessed four large language models with nine different questions that were interrogated five times each with a total of 45 responses per model. All models had examples of perpetuating race-based medicine in their responses. Models were not always consistent in their responses when asked the same question repeatedly. LLMs are being proposed for use in the healthcare setting, with some models already connecting to electronic health record systems. However, this study shows that based on our findings, these LLMs could potentially cause harm by perpetuating debunked, racist ideas.

摘要

大语言模型(LLMs)正在被整合到医疗系统中;但这些模型可能会重现有害的、基于种族的医学观念。本研究的目的是评估四个商用大语言模型在回应八个不同场景时是否会传播有害的、不准确的、基于种族的内容,这些场景用于检查基于种族的医学观念或围绕种族的普遍误解。问题源自四位医师专家的讨论以及医学实习生所相信的关于基于种族的医学误解的先前研究。我们用九个不同的问题评估了四个大语言模型,每个问题被询问五次,每个模型总共得到45个回答。所有模型的回答中都有延续基于种族的医学观念的例子。当被反复问到相同问题时,模型的回答并不总是一致。有人提议在医疗环境中使用大语言模型,一些模型已经连接到电子健康记录系统。然而,这项研究表明,基于我们的发现,这些大语言模型可能会因延续已被揭穿的种族主义观念而潜在地造成伤害。

https://cdn.ncbi.nlm.nih.gov/pmc/blobs/9ac4/10589311/9ce8fa30b67e/41746_2023_939_Fig1_HTML.jpg
https://cdn.ncbi.nlm.nih.gov/pmc/blobs/9ac4/10589311/9ce8fa30b67e/41746_2023_939_Fig1_HTML.jpg
https://cdn.ncbi.nlm.nih.gov/pmc/blobs/9ac4/10589311/9ce8fa30b67e/41746_2023_939_Fig1_HTML.jpg

相似文献

[1]
Large language models propagate race-based medicine.

NPJ Digit Med. 2023-10-20

[2]
Quality of Answers of Generative Large Language Models Versus Peer Users for Interpreting Laboratory Test Results for Lay Patients: Evaluation Study.

J Med Internet Res. 2024-4-17

[3]
Large Language Models Can Enable Inductive Thematic Analysis of a Social Media Corpus in a Single Prompt: Human Validation Study.

JMIR Infodemiology. 2024-8-29

[4]
Quality of Answers of Generative Large Language Models vs Peer Patients for Interpreting Lab Test Results for Lay Patients: Evaluation Study.

ArXiv. 2024-1-23

[5]
Assessing the research landscape and clinical utility of large language models: a scoping review.

BMC Med Inform Decis Mak. 2024-3-12

[6]
On the role of the UMLS in supporting diagnosis generation proposed by Large Language Models.

J Biomed Inform. 2024-9

[7]
Large Language Models and Medical Education: Preparing for a Rapid Transformation in How Trainees Will Learn to Be Doctors.

ATS Sch. 2023-6-14

[8]
Triage Performance Across Large Language Models, ChatGPT, and Untrained Doctors in Emergency Medicine: Comparative Study.

J Med Internet Res. 2024-6-14

[9]
Evaluation of the Performance of Generative AI Large Language Models ChatGPT, Google Bard, and Microsoft Bing Chat in Supporting Evidence-Based Dentistry: Comparative Mixed Methods Study.

J Med Internet Res. 2023-12-28

[10]
Prompt engineering in consistency and reliability with the evidence-based guideline for LLMs.

NPJ Digit Med. 2024-2-20

引用本文的文献

[1]
Promoting trust and intention to adopt health information generated by ChatGPT among healthcare customers: An empirical study.

Digit Health. 2025-8-28

[2]
Graph retrieval augmented large language models for facial phenotype associated rare genetic disease.

NPJ Digit Med. 2025-8-24

[3]
Foundation models in medicine are a social experiment: time for an ethical framework.

NPJ Digit Med. 2025-8-16

[4]
Evaluating acute image ordering for real-world patient cases via language model alignment with radiological guidelines.

Commun Med (Lond). 2025-8-4

[5]
Harm Reduction Strategies for Thoughtful Use of Large Language Models in the Medical Domain: Perspectives for Patients and Clinicians.

J Med Internet Res. 2025-7-25

[6]
Cognitive bias in clinical large language models.

NPJ Digit Med. 2025-7-10

[7]
Implementing Artificial Intelligence in Critical Care Medicine: a consensus of 22.

Crit Care. 2025-7-8

[8]
Improving the Readability of Institutional Heart Failure-Related Patient Education Materials Using GPT-4: Observational Study.

JMIR Cardio. 2025-7-8

[9]
Framework for bias evaluation in large language models in healthcare settings.

NPJ Digit Med. 2025-7-7

[10]
Digitalizing informed consent in healthcare: a scoping review.

BMC Health Serv Res. 2025-7-2

本文引用的文献

[1]
Performance of ChatGPT as an AI-assisted decision support tool in medicine: a proof-of-concept study for interpreting symptoms and management of common cardiac conditions (AMSTELHEART-2).

Acta Cardiol. 2024-5

[2]
Artificial intelligence and anaesthesia examinations: exploring ChatGPT as a prelude to the future.

Br J Anaesth. 2023-8

[3]
Appropriateness of Breast Cancer Prevention and Screening Recommendations Provided by ChatGPT.

Radiology. 2023-5

[4]
Race and Ethnicity in Pulmonary Function Test Interpretation: An Official American Thoracic Society Statement.

Am J Respir Crit Care Med. 2023-4-15

[5]
Performance of ChatGPT on USMLE: Potential for AI-assisted medical education using large language models.

PLOS Digit Health. 2023-2-9

[6]
A Unifying Approach for GFR Estimation: Recommendations of the NKF-ASN Task Force on Reassessing the Inclusion of Race in Diagnosing Kidney Disease.

Am J Kidney Dis. 2022-2

[7]
Racial bias in pain assessment and treatment recommendations, and false beliefs about biological differences between blacks and whites.

Proc Natl Acad Sci U S A. 2016-4-19

[8]
Higher serum creatinine concentrations in black patients with chronic kidney disease: beyond nutritional status and body composition.

Clin J Am Soc Nephrol. 2008-7

[9]
Caliper-measured skin thickness is similar in white and black women.

J Am Acad Dermatol. 2000-1

文献AI研究员

20分钟写一篇综述,助力文献阅读效率提升50倍

立即体验

用中文搜PubMed

大模型驱动的PubMed中文搜索引擎

马上搜索

推荐工具

医学文档翻译智能文献检索