• 文献检索
  • 文档翻译
  • 深度研究
  • 学术资讯
  • Suppr Zotero 插件Zotero 插件
  • 邀请有礼
  • 套餐&价格
  • 历史记录
应用&插件
Suppr Zotero 插件Zotero 插件浏览器插件Mac 客户端Windows 客户端微信小程序
定价
高级版会员购买积分包购买API积分包
服务
文献检索文档翻译深度研究API 文档MCP 服务
关于我们
关于 Suppr公司介绍联系我们用户协议隐私条款
关注我们

Suppr 超能文献

核心技术专利:CN118964589B侵权必究
粤ICP备2023148730 号-1Suppr @ 2026

文献检索

告别复杂PubMed语法,用中文像聊天一样搜索,搜遍4000万医学文献。AI智能推荐,让科研检索更轻松。

立即免费搜索

文件翻译

保留排版,准确专业,支持PDF/Word/PPT等文件格式,支持 12+语言互译。

免费翻译文档

深度研究

AI帮你快速写综述,25分钟生成高质量综述,智能提取关键信息,辅助科研写作。

立即免费体验

大语言模型传播基于种族的医学观念。

Large language models propagate race-based medicine.

作者信息

Omiye Jesutofunmi A, Lester Jenna C, Spichak Simon, Rotemberg Veronica, Daneshjou Roxana

机构信息

Department of Dermatology, Stanford School of Medicine, Stanford, CA, USA.

Department of Biomedical Data Science, Stanford School of Medicine, Stanford, CA, USA.

出版信息

NPJ Digit Med. 2023 Oct 20;6(1):195. doi: 10.1038/s41746-023-00939-z.

DOI:10.1038/s41746-023-00939-z
PMID:37864012
原文链接:https://pmc.ncbi.nlm.nih.gov/articles/PMC10589311/
Abstract

Large language models (LLMs) are being integrated into healthcare systems; but these models may recapitulate harmful, race-based medicine. The objective of this study is to assess whether four commercially available large language models (LLMs) propagate harmful, inaccurate, race-based content when responding to eight different scenarios that check for race-based medicine or widespread misconceptions around race. Questions were derived from discussions among four physician experts and prior work on race-based medical misconceptions believed by medical trainees. We assessed four large language models with nine different questions that were interrogated five times each with a total of 45 responses per model. All models had examples of perpetuating race-based medicine in their responses. Models were not always consistent in their responses when asked the same question repeatedly. LLMs are being proposed for use in the healthcare setting, with some models already connecting to electronic health record systems. However, this study shows that based on our findings, these LLMs could potentially cause harm by perpetuating debunked, racist ideas.

摘要

大语言模型(LLMs)正在被整合到医疗系统中;但这些模型可能会重现有害的、基于种族的医学观念。本研究的目的是评估四个商用大语言模型在回应八个不同场景时是否会传播有害的、不准确的、基于种族的内容,这些场景用于检查基于种族的医学观念或围绕种族的普遍误解。问题源自四位医师专家的讨论以及医学实习生所相信的关于基于种族的医学误解的先前研究。我们用九个不同的问题评估了四个大语言模型,每个问题被询问五次,每个模型总共得到45个回答。所有模型的回答中都有延续基于种族的医学观念的例子。当被反复问到相同问题时,模型的回答并不总是一致。有人提议在医疗环境中使用大语言模型,一些模型已经连接到电子健康记录系统。然而,这项研究表明,基于我们的发现,这些大语言模型可能会因延续已被揭穿的种族主义观念而潜在地造成伤害。

https://cdn.ncbi.nlm.nih.gov/pmc/blobs/9ac4/10589311/9ce8fa30b67e/41746_2023_939_Fig1_HTML.jpg
https://cdn.ncbi.nlm.nih.gov/pmc/blobs/9ac4/10589311/9ce8fa30b67e/41746_2023_939_Fig1_HTML.jpg
https://cdn.ncbi.nlm.nih.gov/pmc/blobs/9ac4/10589311/9ce8fa30b67e/41746_2023_939_Fig1_HTML.jpg

相似文献

1
Large language models propagate race-based medicine.大语言模型传播基于种族的医学观念。
NPJ Digit Med. 2023 Oct 20;6(1):195. doi: 10.1038/s41746-023-00939-z.
2
Quality of Answers of Generative Large Language Models Versus Peer Users for Interpreting Laboratory Test Results for Lay Patients: Evaluation Study.生成式大语言模型与同行用户对解释非专业患者实验室检测结果的答案质量比较:评估研究。
J Med Internet Res. 2024 Apr 17;26:e56655. doi: 10.2196/56655.
3
Large Language Models Can Enable Inductive Thematic Analysis of a Social Media Corpus in a Single Prompt: Human Validation Study.大语言模型可通过单一提示实现社交媒体语料库的归纳主题分析:人类验证研究。
JMIR Infodemiology. 2024 Aug 29;4:e59641. doi: 10.2196/59641.
4
Quality of Answers of Generative Large Language Models vs Peer Patients for Interpreting Lab Test Results for Lay Patients: Evaluation Study.生成式大语言模型与同侪患者为非专业患者解读实验室检查结果的答案质量:评估研究
ArXiv. 2024 Jan 23:arXiv:2402.01693v1.
5
Assessing the research landscape and clinical utility of large language models: a scoping review.评估大型语言模型的研究现状和临床实用性:范围综述。
BMC Med Inform Decis Mak. 2024 Mar 12;24(1):72. doi: 10.1186/s12911-024-02459-6.
6
On the role of the UMLS in supporting diagnosis generation proposed by Large Language Models.在支持大型语言模型提出的诊断生成中 UMLS 的作用。
J Biomed Inform. 2024 Sep;157:104707. doi: 10.1016/j.jbi.2024.104707. Epub 2024 Aug 13.
7
Large Language Models and Medical Education: Preparing for a Rapid Transformation in How Trainees Will Learn to Be Doctors.大语言模型与医学教育:为学员学习成为医生的方式的快速转变做好准备。
ATS Sch. 2023 Jun 14;4(3):282-292. doi: 10.34197/ats-scholar.2023-0036PS. eCollection 2023 Sep.
8
Triage Performance Across Large Language Models, ChatGPT, and Untrained Doctors in Emergency Medicine: Comparative Study.分诊表现比较:大型语言模型、ChatGPT 和未经训练的急诊医生:一项对比研究。
J Med Internet Res. 2024 Jun 14;26:e53297. doi: 10.2196/53297.
9
Evaluation of the Performance of Generative AI Large Language Models ChatGPT, Google Bard, and Microsoft Bing Chat in Supporting Evidence-Based Dentistry: Comparative Mixed Methods Study.评估生成式 AI 大语言模型 ChatGPT、Google Bard 和 Microsoft Bing Chat 在支持循证牙科方面的性能:比较混合方法研究。
J Med Internet Res. 2023 Dec 28;25:e51580. doi: 10.2196/51580.
10
Prompt engineering in consistency and reliability with the evidence-based guideline for LLMs.提示工程在与大语言模型基于证据的指南保持一致性和可靠性方面。
NPJ Digit Med. 2024 Feb 20;7(1):41. doi: 10.1038/s41746-024-01029-4.

引用本文的文献

1
Promoting trust and intention to adopt health information generated by ChatGPT among healthcare customers: An empirical study.促进医疗保健客户对ChatGPT生成的健康信息的信任和采用意愿:一项实证研究。
Digit Health. 2025 Aug 28;11:20552076251374121. doi: 10.1177/20552076251374121. eCollection 2025 Jan-Dec.
2
Graph retrieval augmented large language models for facial phenotype associated rare genetic disease.用于面部表型相关罕见遗传病的图谱检索增强大语言模型
NPJ Digit Med. 2025 Aug 24;8(1):543. doi: 10.1038/s41746-025-01955-x.
3
Foundation models in medicine are a social experiment: time for an ethical framework.

本文引用的文献

1
Performance of ChatGPT as an AI-assisted decision support tool in medicine: a proof-of-concept study for interpreting symptoms and management of common cardiac conditions (AMSTELHEART-2).ChatGPT 在医学中作为 AI 辅助决策支持工具的性能:解释常见心脏疾病症状和管理的概念验证研究 (AMSTELHEART-2)。
Acta Cardiol. 2024 May;79(3):358-366. doi: 10.1080/00015385.2024.2303528. Epub 2024 Feb 13.
2
Artificial intelligence and anaesthesia examinations: exploring ChatGPT as a prelude to the future.人工智能与麻醉学考试:探索ChatGPT作为未来的前奏
Br J Anaesth. 2023 Aug;131(2):e36-e37. doi: 10.1016/j.bja.2023.04.033. Epub 2023 May 26.
3
医学领域的基础模型是一项社会实验:是时候建立一个伦理框架了。
NPJ Digit Med. 2025 Aug 16;8(1):525. doi: 10.1038/s41746-025-01924-4.
4
Evaluating acute image ordering for real-world patient cases via language model alignment with radiological guidelines.通过与放射学指南的语言模型对齐来评估真实世界患者病例的急性影像检查单开具情况。
Commun Med (Lond). 2025 Aug 4;5(1):332. doi: 10.1038/s43856-025-01061-9.
5
Harm Reduction Strategies for Thoughtful Use of Large Language Models in the Medical Domain: Perspectives for Patients and Clinicians.医学领域审慎使用大语言模型的危害降低策略:患者与临床医生的视角
J Med Internet Res. 2025 Jul 25;27:e75849. doi: 10.2196/75849.
6
Cognitive bias in clinical large language models.临床大语言模型中的认知偏差。
NPJ Digit Med. 2025 Jul 10;8(1):428. doi: 10.1038/s41746-025-01790-0.
7
Implementing Artificial Intelligence in Critical Care Medicine: a consensus of 22.在重症医学中实施人工智能:22 位专家的共识
Crit Care. 2025 Jul 8;29(1):290. doi: 10.1186/s13054-025-05532-2.
8
Improving the Readability of Institutional Heart Failure-Related Patient Education Materials Using GPT-4: Observational Study.使用GPT-4提高机构性心力衰竭相关患者教育材料的可读性:观察性研究
JMIR Cardio. 2025 Jul 8;9:e68817. doi: 10.2196/68817.
9
Framework for bias evaluation in large language models in healthcare settings.医疗环境中大型语言模型偏差评估框架。
NPJ Digit Med. 2025 Jul 7;8(1):414. doi: 10.1038/s41746-025-01786-w.
10
Digitalizing informed consent in healthcare: a scoping review.医疗保健领域的知情同意数字化:一项范围综述
BMC Health Serv Res. 2025 Jul 2;25(1):893. doi: 10.1186/s12913-025-12964-7.
Appropriateness of Breast Cancer Prevention and Screening Recommendations Provided by ChatGPT.
ChatGPT提供的乳腺癌预防和筛查建议的适宜性。
Radiology. 2023 May;307(4):e230424. doi: 10.1148/radiol.230424. Epub 2023 Apr 4.
4
Race and Ethnicity in Pulmonary Function Test Interpretation: An Official American Thoracic Society Statement.肺功能测试解读中的种族和民族差异:美国胸科学会官方声明。
Am J Respir Crit Care Med. 2023 Apr 15;207(8):978-995. doi: 10.1164/rccm.202302-0310ST.
5
Performance of ChatGPT on USMLE: Potential for AI-assisted medical education using large language models.ChatGPT在美国医师执照考试中的表现:使用大语言模型进行人工智能辅助医学教育的潜力。
PLOS Digit Health. 2023 Feb 9;2(2):e0000198. doi: 10.1371/journal.pdig.0000198. eCollection 2023 Feb.
6
A Unifying Approach for GFR Estimation: Recommendations of the NKF-ASN Task Force on Reassessing the Inclusion of Race in Diagnosing Kidney Disease.一种统一的肾小球滤过率估计方法:NKF-ASN 工作组关于重新评估种族在诊断肾脏疾病中的纳入的建议。
Am J Kidney Dis. 2022 Feb;79(2):268-288.e1. doi: 10.1053/j.ajkd.2021.08.003. Epub 2021 Sep 23.
7
Racial bias in pain assessment and treatment recommendations, and false beliefs about biological differences between blacks and whites.疼痛评估与治疗建议中的种族偏见,以及对黑人和白人之间生物学差异的错误认知。
Proc Natl Acad Sci U S A. 2016 Apr 19;113(16):4296-301. doi: 10.1073/pnas.1516047113. Epub 2016 Apr 4.
8
Higher serum creatinine concentrations in black patients with chronic kidney disease: beyond nutritional status and body composition.慢性肾病黑人患者血清肌酐浓度较高:营养状况和身体组成之外的因素
Clin J Am Soc Nephrol. 2008 Jul;3(4):992-7. doi: 10.2215/CJN.00090108. Epub 2008 Apr 16.
9
Caliper-measured skin thickness is similar in white and black women.用卡尺测量的皮肤厚度在白人女性和黑人女性中相似。
J Am Acad Dermatol. 2000 Jan;42(1 Pt 1):76-9. doi: 10.1016/s0190-9622(00)90012-4.