• 文献检索
  • 文档翻译
  • 深度研究
  • 学术资讯
  • Suppr Zotero 插件Zotero 插件
  • 邀请有礼
  • 套餐&价格
  • 历史记录
应用&插件
Suppr Zotero 插件Zotero 插件浏览器插件Mac 客户端Windows 客户端微信小程序
定价
高级版会员购买积分包购买API积分包
服务
文献检索文档翻译深度研究API 文档MCP 服务
关于我们
关于 Suppr公司介绍联系我们用户协议隐私条款
关注我们

Suppr 超能文献

核心技术专利:CN118964589B侵权必究
粤ICP备2023148730 号-1Suppr @ 2026

文献检索

告别复杂PubMed语法,用中文像聊天一样搜索,搜遍4000万医学文献。AI智能推荐,让科研检索更轻松。

立即免费搜索

文件翻译

保留排版,准确专业,支持PDF/Word/PPT等文件格式,支持 12+语言互译。

免费翻译文档

深度研究

AI帮你快速写综述,25分钟生成高质量综述,智能提取关键信息,辅助科研写作。

立即免费体验

大语言模型在医疗决策中的社会人口统计学偏差。

Sociodemographic biases in medical decision making by large language models.

作者信息

Omar Mahmud, Soffer Shelly, Agbareia Reem, Bragazzi Nicola Luigi, Apakama Donald U, Horowitz Carol R, Charney Alexander W, Freeman Robert, Kummer Benjamin, Glicksberg Benjamin S, Nadkarni Girish N, Klang Eyal

机构信息

The Windreich Department of Artificial Intelligence and Human Health, Icahn School of Medicine at Mount Sinai and the Mount Sinai Health System, New York, NY, USA.

The Hasso Plattner Institute for Digital Health at Mount Sinai, Icahn School of Medicine at Mount Sinai and the Mount Sinai Health System, New York, NY, USA.

出版信息

Nat Med. 2025 Apr 7. doi: 10.1038/s41591-025-03626-6.

DOI:10.1038/s41591-025-03626-6
PMID:40195448
Abstract

Large language models (LLMs) show promise in healthcare, but concerns remain that they may produce medically unjustified clinical care recommendations reflecting the influence of patients' sociodemographic characteristics. We evaluated nine LLMs, analyzing over 1.7 million model-generated outputs from 1,000 emergency department cases (500 real and 500 synthetic). Each case was presented in 32 variations (31 sociodemographic groups plus a control) while holding clinical details constant. Compared to both a physician-derived baseline and each model's own control case without sociodemographic identifiers, cases labeled as Black or unhoused or identifying as LGBTQIA+ were more frequently directed toward urgent care, invasive interventions or mental health evaluations. For example, certain cases labeled as being from LGBTQIA+ subgroups were recommended mental health assessments approximately six to seven times more often than clinically indicated. Similarly, cases labeled as having high-income status received significantly more recommendations (P < 0.001) for advanced imaging tests such as computed tomography and magnetic resonance imaging, while low- and middle-income-labeled cases were often limited to basic or no further testing. After applying multiple-hypothesis corrections, these key differences persisted. Their magnitude was not supported by clinical reasoning or guidelines, suggesting that they may reflect model-driven bias, which could eventually lead to health disparities rather than acceptable clinical variation. Our findings, observed in both proprietary and open-source models, underscore the need for robust bias evaluation and mitigation strategies to ensure that LLM-driven medical advice remains equitable and patient centered.

摘要

大语言模型(LLMs)在医疗保健领域展现出了前景,但人们仍担心它们可能会产生反映患者社会人口统计学特征影响的、在医学上不合理的临床护理建议。我们评估了九个大语言模型,分析了来自1000个急诊科病例(500个真实病例和500个合成病例)的超过170万个模型生成的输出。每个病例以32种变体形式呈现(31个社会人口统计学组加上一个对照组),同时保持临床细节不变。与医生得出的基线以及每个模型自身没有社会人口统计学标识符的对照病例相比,被标记为黑人、无家可归者或自我认同为LGBTQIA+的病例更频繁地被导向紧急护理、侵入性干预或心理健康评估。例如,某些被标记为属于LGBTQIA+亚组的病例被建议进行心理健康评估的频率比临床指征高出约六到七倍。同样,被标记为高收入状态的病例接受计算机断层扫描和磁共振成像等高级影像学检查的建议显著更多(P < 0.001),而被标记为低收入和中等收入的病例往往仅限于基本检查或不再进行进一步检查。在应用多重假设校正后,这些关键差异仍然存在。它们的程度没有得到临床推理或指南的支持,这表明它们可能反映了模型驱动的偏差,最终可能导致健康差距而非可接受的临床差异。我们在专有模型和开源模型中都观察到的结果强调了需要强大的偏差评估和缓解策略,以确保基于大语言模型的医疗建议保持公平且以患者为中心。

相似文献

1
Sociodemographic biases in medical decision making by large language models.大语言模型在医疗决策中的社会人口统计学偏差。
Nat Med. 2025 Apr 7. doi: 10.1038/s41591-025-03626-6.
2
Stakeholders' perceptions and experiences of factors influencing the commissioning, delivery, and uptake of general health checks: a qualitative evidence synthesis.利益相关者对影响一般健康检查的委托、提供和接受因素的看法与体验:一项定性证据综合分析
Cochrane Database Syst Rev. 2025 Mar 20;3(3):CD014796. doi: 10.1002/14651858.CD014796.pub2.
3
The use of telemedicine services for medical abortion.远程医疗服务在药物流产中的应用。
Cochrane Database Syst Rev. 2025 Jun 4;6(6):CD013764. doi: 10.1002/14651858.CD013764.pub2.
4
Aural toilet (ear cleaning) for chronic suppurative otitis media.慢性化脓性中耳炎的耳道清理(耳部清洁)
Cochrane Database Syst Rev. 2025 Jun 9;6(6):CD013057. doi: 10.1002/14651858.CD013057.pub3.
5
Preferences on Treatment Decision Making in Sarcoma Patients: Prevalence and Associated Factors - Results from the PROSa Study.肉瘤患者治疗决策的偏好:患病率及相关因素——PROSa研究结果
Oncol Res Treat. 2025;48(4):174-185. doi: 10.1159/000543456. Epub 2025 Jan 15.
6
Surveillance for Violent Deaths - National Violent Death Reporting System, 50 States, the District of Columbia, and Puerto Rico, 2022.暴力死亡监测——2022年全国暴力死亡报告系统,50个州、哥伦比亚特区和波多黎各
MMWR Surveill Summ. 2025 Jun 12;74(5):1-42. doi: 10.15585/mmwr.ss7405a1.
7
Masculinity and colorectal cancer screening: a cross-sectional study of men attending state fairs in Minnesota and Wisconsin.男性气质与结直肠癌筛查:一项对参加明尼苏达州和威斯康星州州际集市的男性的横断面研究。
Ann Behav Med. 2025 Jan 4;59(1). doi: 10.1093/abm/kaaf040.
8
Non-pharmacological interventions for sleep promotion in hospitalized children.促进住院儿童睡眠的非药物干预措施。
Cochrane Database Syst Rev. 2022 Jun 15;6(6):CD012908. doi: 10.1002/14651858.CD012908.pub2.
9
Pelvic floor muscle training with feedback or biofeedback for urinary incontinence in women.针对女性尿失禁的盆底肌训练及反馈或生物反馈训练
Cochrane Database Syst Rev. 2025 Mar 11;3(3):CD009252. doi: 10.1002/14651858.CD009252.pub2.
10
Prognostic factors for return to work in breast cancer survivors.乳腺癌幸存者恢复工作的预后因素。
Cochrane Database Syst Rev. 2025 May 7;5(5):CD015124. doi: 10.1002/14651858.CD015124.pub2.

引用本文的文献

1
Challenges of Implementing LLMs in Clinical Practice: Perspectives.在临床实践中应用大语言模型的挑战:观点
J Clin Med. 2025 Sep 1;14(17):6169. doi: 10.3390/jcm14176169.
2
Editorial: Empowering suicide prevention efforts with generative AI technology.社论:利用生成式人工智能技术加强自杀预防工作。
Front Psychiatry. 2025 Aug 26;16:1643893. doi: 10.3389/fpsyt.2025.1643893. eCollection 2025.
3
Evaluating anti-LGBTQIA+ medical bias in large language models.评估大语言模型中针对 LGBTQIA+ 群体的医学偏见。

本文引用的文献

1
Addressing hidden risks: Systematic review of artificial intelligence biases across racial and ethnic groups in cardiovascular diseases.应对潜在风险:心血管疾病中人工智能在不同种族和族裔群体间偏差的系统评价
Eur J Radiol. 2025 Feb;183:111867. doi: 10.1016/j.ejrad.2024.111867. Epub 2024 Nov 30.
2
Meeting the Health and Social Needs of America's Unhoused and Housing-Unstable Populations: A Position Paper From the American College of Physicians.满足美国无家可归和住房不稳定人群的健康和社会需求:美国医师学院的立场文件。
Ann Intern Med. 2024 Apr;177(4):514-517. doi: 10.7326/M23-2795. Epub 2024 Feb 27.
3
Assessing the potential of GPT-4 to perpetuate racial and gender biases in health care: a model evaluation study.
PLOS Digit Health. 2025 Sep 8;4(9):e0001001. doi: 10.1371/journal.pdig.0001001. eCollection 2025 Sep.
4
AI Agents in Clinical Medicine: A Systematic Review.临床医学中的人工智能代理:一项系统综述。
medRxiv. 2025 Aug 26:2025.08.22.25334232. doi: 10.1101/2025.08.22.25334232.
5
Orchestrated multi agents sustain accuracy under clinical-scale workloads compared to a single agent.与单个智能体相比,精心编排的多智能体在临床规模的工作量下能保持准确性。
medRxiv. 2025 Aug 24:2025.08.22.25334049. doi: 10.1101/2025.08.22.25334049.
6
Large language models for clinical decision support in gastroenterology and hepatology.用于胃肠病学和肝病学临床决策支持的大语言模型
Nat Rev Gastroenterol Hepatol. 2025 Aug 22. doi: 10.1038/s41575-025-01108-1.
7
Performance of Open-Source Large Language Models in Psychiatry: Usability Study Through Comparative Analysis of Non-English Records and English Translations.开源大语言模型在精神病学中的表现:通过非英语记录与英语译文的对比分析进行可用性研究
J Med Internet Res. 2025 Aug 18;27:e69857. doi: 10.2196/69857.
8
Foundation models in medicine are a social experiment: time for an ethical framework.医学领域的基础模型是一项社会实验:是时候建立一个伦理框架了。
NPJ Digit Med. 2025 Aug 16;8(1):525. doi: 10.1038/s41746-025-01924-4.
9
Evaluating prompt and data perturbation sensitivity in large language models for radiology reports classification.评估用于放射学报告分类的大语言模型中提示和数据扰动的敏感性。
JAMIA Open. 2025 Aug 12;8(4):ooaf073. doi: 10.1093/jamiaopen/ooaf073. eCollection 2025 Aug.
10
Evaluating acute image ordering for real-world patient cases via language model alignment with radiological guidelines.通过与放射学指南的语言模型对齐来评估真实世界患者病例的急性影像检查单开具情况。
Commun Med (Lond). 2025 Aug 4;5(1):332. doi: 10.1038/s43856-025-01061-9.
评估 GPT-4 在医疗保健中延续种族和性别偏见的潜力:一项模型评估研究。
Lancet Digit Health. 2024 Jan;6(1):e12-e22. doi: 10.1016/S2589-7500(23)00225-X.
4
The future landscape of large language models in medicine.医学领域大语言模型的未来前景。
Commun Med (Lond). 2023 Oct 10;3(1):141. doi: 10.1038/s43856-023-00370-1.
5
Large language models in medicine.医学中的大型语言模型。
Nat Med. 2023 Aug;29(8):1930-1940. doi: 10.1038/s41591-023-02448-8. Epub 2023 Jul 17.
6
Listen to the Whispers before They Become Screams: Addressing Black Maternal Morbidity and Mortality in the United States.在悄声细语变成呐喊之前倾听:解决美国黑人孕产妇的发病和死亡问题。
Healthcare (Basel). 2023 Feb 3;11(3):438. doi: 10.3390/healthcare11030438.
7
Interventions to reduce cancer screening inequities: the perspective and role of patients, advocacy groups, and empowerment organizations.减少癌症筛查不公平现象的干预措施:患者、倡导团体和赋权组织的观点和作用。
Int J Equity Health. 2023 Jan 27;22(1):19. doi: 10.1186/s12939-023-01841-6.
8
Increased risks for mental disorders among LGB individuals: cross-national evidence from the World Mental Health Surveys.LGB 个体的精神障碍风险增加:来自世界心理健康调查的跨国证据。
Soc Psychiatry Psychiatr Epidemiol. 2022 Nov;57(11):2319-2332. doi: 10.1007/s00127-022-02320-z. Epub 2022 Jul 19.
9
Eliminating Explicit and Implicit Biases in Health Care: Evidence and Research Needs.消除医疗保健中的显性和隐性偏见:证据和研究需求。
Annu Rev Public Health. 2022 Apr 5;43:477-501. doi: 10.1146/annurev-publhealth-052620-103528. Epub 2022 Jan 12.
10
Availability of essential diagnostics in ten low-income and middle-income countries: results from national health facility surveys.十个低收入和中等收入国家基本诊断服务的可及性:国家卫生机构调查结果
Lancet Glob Health. 2021 Nov;9(11):e1553-e1560. doi: 10.1016/S2214-109X(21)00442-3. Epub 2021 Oct 6.