• 文献检索
  • 文档翻译
  • 深度研究
  • 学术资讯
  • Suppr Zotero 插件Zotero 插件
  • 邀请有礼
  • 套餐&价格
  • 历史记录
应用&插件
Suppr Zotero 插件Zotero 插件浏览器插件Mac 客户端Windows 客户端微信小程序
定价
高级版会员购买积分包购买API积分包
服务
文献检索文档翻译深度研究API 文档MCP 服务
关于我们
关于 Suppr公司介绍联系我们用户协议隐私条款
关注我们

Suppr 超能文献

核心技术专利:CN118964589B侵权必究
粤ICP备2023148730 号-1Suppr @ 2026

文献检索

告别复杂PubMed语法,用中文像聊天一样搜索,搜遍4000万医学文献。AI智能推荐,让科研检索更轻松。

立即免费搜索

文件翻译

保留排版,准确专业,支持PDF/Word/PPT等文件格式,支持 12+语言互译。

免费翻译文档

深度研究

AI帮你快速写综述,25分钟生成高质量综述,智能提取关键信息,辅助科研写作。

立即免费体验

评估大语言模型中针对 LGBTQIA+ 群体的医学偏见。

Evaluating anti-LGBTQIA+ medical bias in large language models.

作者信息

Chang Crystal T, Srivathsa Neha, Bou-Khalil Charbel, Swaminathan Akshay, Lunn Mitchell R, Mishra Kavita, Koyejo Sanmi, Daneshjou Roxana

机构信息

Department of Dermatology, Stanford University, Stanford, California, United States of America.

Department of Computer Science, Stanford University, Stanford, California, United States of America.

出版信息

PLOS Digit Health. 2025 Sep 8;4(9):e0001001. doi: 10.1371/journal.pdig.0001001. eCollection 2025 Sep.

DOI:10.1371/journal.pdig.0001001
PMID:40920790
原文链接:https://pmc.ncbi.nlm.nih.gov/articles/PMC12416741/
Abstract

Large Language Models (LLMs) are increasingly deployed in clinical settings for tasks ranging from patient communication to decision support. While these models demonstrate race-based and binary gender biases, anti-LGBTQIA+ bias remains understudied despite documented healthcare disparities affecting these populations. In this work, we evaluated the potential of LLMs to propagate anti-LGBTQIA+ medical bias and misinformation. We prompted 4 LLMs (Gemini 1.5 Flash, Claude 3 Haiku, GPT-4o, Stanford Medicine Secure GPT [GPT-4.0]) with 38 prompts consisting of explicit questions and synthetic clinical notes created by medically-trained reviewers and LGBTQIA+ health experts. The prompts consisted of pairs of prompts with and without LGBTQIA+ identity terms and explored clinical situations across two axes: (i) situations where historical bias has been observed versus not observed, and (ii) situations where LGBTQIA+ identity is relevant to clinical care versus not relevant. Medically-trained reviewers evaluated LLM responses for appropriateness (safety, privacy, hallucination/accuracy, and bias) and clinical utility. We found that all 4 LLMs generated inappropriate responses for prompts with and without LGBTQIA+ identity terms. The proportion of inappropriate responses ranged from 43-62% for prompts mentioning LGBTQIA+ identities versus 47-65% for those without. The most common reason for inappropriate classification tended to be hallucination/accuracy, followed by bias or safety. Qualitatively, we observed differential bias patterns, with LGBTQIA+ prompts eliciting more severe bias. Average clinical utility score for inappropriate responses was lower than for appropriate responses (2.6 versus 3.7 on a 5-point Likert scale). Future work should focus on tailoring output formats to stated use cases, decreasing sycophancy and reliance on extraneous information in the prompt, and improving accuracy and decreasing bias for LGBTQIA+ patients. We present our prompts and annotated responses as a benchmark for evaluation of future models. Content warning: This paper includes prompts and model-generated responses that may be offensive.

摘要

大语言模型(LLMs)越来越多地被部署在临床环境中,用于从患者沟通到决策支持等各种任务。虽然这些模型表现出基于种族和二元性别的偏见,但尽管有记录表明影响这些人群的医疗保健差异,但反 LGBTQIA+ 偏见仍未得到充分研究。在这项工作中,我们评估了大语言模型传播反 LGBTQIA+ 医学偏见和错误信息的可能性。我们向4个大语言模型(Gemini 1.5 Flash、Claude 3 Haiku、GPT-4o、斯坦福医学安全GPT [GPT-4.0])提出了38个提示,这些提示包括明确的问题以及由医学训练的评审员和 LGBTQIA+ 健康专家创建的综合临床记录。这些提示由带有和不带有 LGBTQIA+ 身份术语的提示对组成,并在两个轴上探讨临床情况:(i)观察到历史偏见与未观察到历史偏见的情况,以及(ii)LGBTQIA+ 身份与临床护理相关与不相关的情况。医学训练的评审员评估大语言模型的回答是否合适(安全性、隐私性、幻觉/准确性和偏见)以及临床实用性。我们发现,对于带有和不带有 LGBTQIA+ 身份术语的提示,所有4个大语言模型都给出了不合适的回答。提及 LGBTQIA+ 身份的提示中不合适回答的比例在43%-62%之间,而不提及的提示中这一比例在47%-65%之间。不合适分类的最常见原因往往是幻觉/准确性,其次是偏见或安全性。定性地说,我们观察到了不同的偏见模式,LGBTQIA+ 提示引发了更严重的偏见。不合适回答的平均临床实用性得分低于合适回答(在5分李克特量表上分别为2.6分和3.7分)。未来的工作应侧重于根据既定用例调整输出格式,减少谄媚行为以及对提示中无关信息的依赖,并提高针对 LGBTQIA+ 患者的准确性和减少偏见。我们展示我们的提示和注释后的回答,作为评估未来模型的基准。内容警告:本文包含可能令人反感的提示和模型生成的回答。

https://cdn.ncbi.nlm.nih.gov/pmc/blobs/2a10/12416741/a9f45101f3c8/pdig.0001001.g004.jpg
https://cdn.ncbi.nlm.nih.gov/pmc/blobs/2a10/12416741/6639a4206378/pdig.0001001.g001.jpg
https://cdn.ncbi.nlm.nih.gov/pmc/blobs/2a10/12416741/14f077838db5/pdig.0001001.g002.jpg
https://cdn.ncbi.nlm.nih.gov/pmc/blobs/2a10/12416741/d819db1115d7/pdig.0001001.g003.jpg
https://cdn.ncbi.nlm.nih.gov/pmc/blobs/2a10/12416741/a9f45101f3c8/pdig.0001001.g004.jpg
https://cdn.ncbi.nlm.nih.gov/pmc/blobs/2a10/12416741/6639a4206378/pdig.0001001.g001.jpg
https://cdn.ncbi.nlm.nih.gov/pmc/blobs/2a10/12416741/14f077838db5/pdig.0001001.g002.jpg
https://cdn.ncbi.nlm.nih.gov/pmc/blobs/2a10/12416741/d819db1115d7/pdig.0001001.g003.jpg
https://cdn.ncbi.nlm.nih.gov/pmc/blobs/2a10/12416741/a9f45101f3c8/pdig.0001001.g004.jpg

相似文献

1
Evaluating anti-LGBTQIA+ medical bias in large language models.评估大语言模型中针对 LGBTQIA+ 群体的医学偏见。
PLOS Digit Health. 2025 Sep 8;4(9):e0001001. doi: 10.1371/journal.pdig.0001001. eCollection 2025 Sep.
2
Prescription of Controlled Substances: Benefits and Risks管制药品的处方:益处与风险
3
Sexual Harassment and Prevention Training性骚扰与预防培训
4
Designing Patient-Centered Communication Aids in Pediatric Surgery Using Large Language Models.使用大语言模型设计儿科手术中以患者为中心的沟通辅助工具
J Pediatr Surg. 2025 Sep 8:162654. doi: 10.1016/j.jpedsurg.2025.162654.
5
Aspects of Genetic Diversity, Host Specificity and Public Health Significance of Single-Celled Intestinal Parasites Commonly Observed in Humans and Mostly Referred to as 'Non-Pathogenic'.人类常见且大多被称为“非致病性”的单细胞肠道寄生虫的遗传多样性、宿主特异性及公共卫生意义
APMIS. 2025 Sep;133(9):e70036. doi: 10.1111/apm.70036.
6
Improving Large Language Models' Summarization Accuracy by Adding Highlights to Discharge Notes: Comparative Evaluation.通过在出院小结中添加重点内容提高大语言模型的总结准确性:比较评估
JMIR Med Inform. 2025 Jul 24;13:e66476. doi: 10.2196/66476.
7
Large Language Model Symptom Identification From Clinical Text: Multicenter Study.基于临床文本的大语言模型症状识别:多中心研究。
J Med Internet Res. 2025 Jul 31;27:e72984. doi: 10.2196/72984.
8
Rapidly Benchmarking Large Language Models for Diagnosing Comorbid Patients: Comparative Study Leveraging the LLM-as-a-Judge Method.快速对用于诊断合并症患者的大语言模型进行基准测试:利用“大语言模型即评判者”方法的比较研究
JMIRx Med. 2025 Aug 29;6:e67661. doi: 10.2196/67661.
9
Supports and Barriers to Inclusive Workplaces for LGBTQIA+ Autistic Adults in the United States.美国为LGBTQIA+成年自闭症患者打造包容性工作场所的支持因素与障碍
Autism Adulthood. 2024 Dec 2;6(4):485-494. doi: 10.1089/aut.2022.0092. eCollection 2024 Dec.
10
Improving Energy Access, Climate and Socio-Economic Outcomes Through Off-Grid Electrification Technologies: A Systematic Review.通过离网电气化技术改善能源获取、气候和社会经济成果:一项系统综述。
Campbell Syst Rev. 2025 Aug 15;21(3):e70060. doi: 10.1002/cl2.70060. eCollection 2025 Sep.

本文引用的文献

1
Mitigating the risk of health inequity exacerbated by large language models.减轻大语言模型加剧的健康不平等风险。
NPJ Digit Med. 2025 May 4;8(1):246. doi: 10.1038/s41746-025-01576-4.
2
Sociodemographic biases in medical decision making by large language models.大语言模型在医疗决策中的社会人口统计学偏差。
Nat Med. 2025 Apr 7. doi: 10.1038/s41591-025-03626-6.
3
LLM-Guided Pain Management: Examining Socio-Demographic Gaps in Cancer vs non-Cancer cases.大语言模型引导的疼痛管理:研究癌症与非癌症病例中的社会人口统计学差距。
medRxiv. 2025 Mar 5:2025.03.04.25323396. doi: 10.1101/2025.03.04.25323396.
4
Evaluating and addressing demographic disparities in medical large language models: a systematic review.评估和解决医学大语言模型中的人口统计学差异:一项系统综述。
Int J Equity Health. 2025 Feb 26;24(1):57. doi: 10.1186/s12939-025-02419-0.
5
CDC Clinical Guidelines on the Use of Doxycycline Postexposure Prophylaxis for Bacterial Sexually Transmitted Infection Prevention, United States, 2024.美国疾病预防控制中心 2024 年关于使用多西环素进行细菌性性传播感染预防的暴露后预防临床指南。
MMWR Recomm Rep. 2024 Jun 6;73(2):1-8. doi: 10.15585/mmwr.rr7302a1.
6
Artificial Intelligence-Generated Draft Replies to Patient Inbox Messages.人工智能生成的回复患者收件箱消息草稿。
JAMA Netw Open. 2024 Mar 4;7(3):e243201. doi: 10.1001/jamanetworkopen.2024.3201.
7
Large language models propagate race-based medicine.大语言模型传播基于种族的医学观念。
NPJ Digit Med. 2023 Oct 20;6(1):195. doi: 10.1038/s41746-023-00939-z.
8
Gender-Affirming Care of Transgender and Gender-Diverse Youth: Current Concepts.跨性别和性别多样化青年的性别认同关爱:当前概念。
Annu Rev Med. 2023 Jan 27;74:107-116. doi: 10.1146/annurev-med-043021-032007. Epub 2022 Oct 19.
9
The effect of gender-affirming hormone treatment on serum creatinine in transgender and gender-diverse youth: implications for estimating GFR.性别肯定激素治疗对跨性别和性别多样化青年血清肌酐的影响:对估计肾小球滤过率的影响。
Pediatr Nephrol. 2022 Sep;37(9):2141-2150. doi: 10.1007/s00467-022-05445-0. Epub 2022 Jan 26.
10
Androgenetic alopecia in transgender and gender diverse populations: A review of therapeutics.跨性别和性别多样化人群中的雄激素性脱发:治疗方法综述。
J Am Acad Dermatol. 2023 Oct;89(4):774-783. doi: 10.1016/j.jaad.2021.08.067. Epub 2021 Oct 28.