大语言模型在疼痛管理的阿片类药物推荐中的种族、民族和性别偏见。

Racial, ethnic, and sex bias in large language model opioid recommendations for pain management.

作者信息

Young Cameron C, Enichen Elizabeth, Rao Arya, Succi Marc D

机构信息

Harvard Medical School, Boston, MA, United States.

Medically Engineered Solutions in Healthcare Incubator, Innovation in Operations Research Center (MESH IO), Mass General Brigham, Boston, MA, United States.

出版信息

Pain. 2025 Mar 1;166(3):511-517. doi: 10.1097/j.pain.0000000000003388. Epub 2024 Sep 6.

DOI:10.1097/j.pain.0000000000003388

PMID:39283333

原文链接:https://pmc.ncbi.nlm.nih.gov/articles/PMC12042288/

Abstract

Understanding how large language model (LLM) recommendations vary with patient race/ethnicity provides insight into how LLMs may counter or compound bias in opioid prescription. Forty real-world patient cases were sourced from the MIMIC-IV Note dataset with chief complaints of abdominal pain, back pain, headache, or musculoskeletal pain and amended to include all combinations of race/ethnicity and sex. Large language models were instructed to provide a subjective pain rating and comprehensive pain management recommendation. Univariate analyses were performed to evaluate the association between racial/ethnic group or sex and the specified outcome measures-subjective pain rating, opioid name, order, and dosage recommendations-suggested by 2 LLMs (GPT-4 and Gemini). Four hundred eighty real-world patient cases were provided to each LLM, and responses included pharmacologic and nonpharmacologic interventions. Tramadol was the most recommended weak opioid in 55.4% of cases, while oxycodone was the most frequently recommended strong opioid in 33.2% of cases. Relative to GPT-4, Gemini was more likely to rate a patient's pain as "severe" (OR: 0.57 95% CI: [0.54, 0.60]; P < 0.001), recommend strong opioids (OR: 2.05 95% CI: [1.59, 2.66]; P < 0.001), and recommend opioids later (OR: 1.41 95% CI: [1.22, 1.62]; P < 0.001). Race/ethnicity and sex did not influence LLM recommendations. This study suggests that LLMs do not preferentially recommend opioid treatment for one group over another. Given that prior research shows race-based disparities in pain perception and treatment by healthcare providers, LLMs may offer physicians a helpful tool to guide their pain management and ensure equitable treatment across patient groups.

摘要

了解大语言模型（LLM）的建议如何随患者种族/族裔而变化，有助于深入了解LLM在阿片类药物处方中如何对抗或加剧偏见。从MIMIC-IV病历数据集中选取了40个真实世界的患者病例，主要症状为腹痛、背痛、头痛或肌肉骨骼疼痛，并进行了修改，纳入了种族/族裔和性别的所有组合。要求大语言模型提供主观疼痛评分和全面的疼痛管理建议。进行单因素分析，以评估种族/族裔群体或性别与两个大语言模型（GPT-4和Gemini）建议的特定结局指标——主观疼痛评分、阿片类药物名称、医嘱和剂量建议之间的关联。向每个大语言模型提供了480个真实世界的患者病例，其回复包括药物和非药物干预措施。曲马多是55.4%的病例中最常被推荐的弱阿片类药物，而羟考酮是33.2%的病例中最常被推荐的强阿片类药物。相对于GPT-4，Gemini更有可能将患者的疼痛评为“严重”（比值比：0.57，95%置信区间：[0.54, 0.60]；P < 0.001），推荐强阿片类药物（比值比：2.05，95%置信区间：[1.59, 2.66]；P < 0.001），且更晚推荐阿片类药物（比值比：1.41，95%置信区间：[1.22, 1.62]；P < 0.001）。种族/族裔和性别并未影响大语言模型的建议。这项研究表明，大语言模型不会优先为某一群体推荐阿片类药物治疗而不是另一群体。鉴于先前的研究表明医疗保健提供者在疼痛感知和治疗方面存在基于种族的差异，大语言模型可能为医生提供一个有用的工具，以指导他们的疼痛管理并确保对不同患者群体的公平治疗。

相似文献

Racial, ethnic, and sex bias in large language model opioid recommendations for pain management.大语言模型在疼痛管理的阿片类药物推荐中的种族、民族和性别偏见。

Pain. 2025 Mar 1;166(3):511-517. doi: 10.1097/j.pain.0000000000003388. Epub 2024 Sep 6.

Prescription of Controlled Substances: Benefits and Risks管制药品的处方：益处与风险

Assessing Racial and Ethnic Bias in Text Generation by Large Language Models for Health Care-Related Tasks: Cross-Sectional Study.大型语言模型在医疗相关任务的文本生成中种族和民族偏见的评估：横断面研究。

J Med Internet Res. 2025 Mar 13;27:e57257. doi: 10.2196/57257.

Differences in Acute Postoperative Opioid Use by English Proficiency, Race, and Ethnicity After Total Knee and Hip Arthroplasty.全膝关节和髋关节置换术后急性阿片类药物使用在英语水平、种族和族裔方面的差异。

Anesth Analg. 2025 Jan 1;140(1):155-164. doi: 10.1213/ANE.0000000000007068. Epub 2024 Aug 1.

Racial-Ethnic Disparities in Opioid Prescriptions at Emergency Department Visits for Conditions Commonly Associated with Prescription Drug Abuse.急诊科就诊时因常见于与处方药滥用相关病症的阿片类药物处方中的种族-族裔差异。

PLoS One. 2016 Aug 8;11(8):e0159224. doi: 10.1371/journal.pone.0159224. eCollection 2016.

Race/Ethnicity and Sex and Opioid Administration in the Emergency Room.急诊室中的种族/民族、性别和阿片类药物的使用。

Anesth Analg. 2019 May;128(5):1005-1012. doi: 10.1213/ANE.0000000000003517.

Racial and Ethnic Disparities in Opioid Use Among US Adults With Back Pain.美国成年人腰痛患者阿片类药物使用的种族和民族差异。

Spine (Phila Pa 1976). 2020 Aug 1;45(15):1062-1066. doi: 10.1097/BRS.0000000000003466.

Racial and Ethnic Disparities in the Prescribing of Pain Medication in US Primary Care Settings, 1999-2019: Where Are We Now?1999-2019 年美国初级保健环境中开具疼痛药物处方的种族和民族差异：我们现在在哪里？

J Gen Intern Med. 2024 Jul;39(9):1597-1605. doi: 10.1007/s11606-024-08638-5. Epub 2024 Feb 1.

NP Safe Prescribing of Controlled Substances While Avoiding Drug Diversion安全开具管制药品处方，同时避免药物转移

Florida Controlled Substance Prescribing佛罗里达州受管制物质处方开具

引用本文的文献

Development and Evaluation of an Artificial Intelligence-Powered Surgical Oral Examination Simulator: A Pilot Study.人工智能驱动的外科口腔检查模拟器的开发与评估：一项试点研究。

Mayo Clin Proc Digit Health. 2025 Jun 9;3(3):100241. doi: 10.1016/j.mcpdig.2025.100241. eCollection 2025 Sep.

Implementing large language models in healthcare while balancing control, collaboration, costs and security.在医疗保健领域应用大语言模型的同时，平衡控制、协作、成本和安全性。

NPJ Digit Med. 2025 Mar 6;8(1):143. doi: 10.1038/s41746-025-01476-7.

A Future of Self-Directed Patient Internet Research: Large Language Model-Based Tools Versus Standard Search Engines.自主导向的患者网络研究的未来：基于大语言模型的工具与标准搜索引擎

Ann Biomed Eng. 2025 May;53(5):1199-1208. doi: 10.1007/s10439-025-03701-6. Epub 2025 Mar 3.

AI and the ethics of techno-solutionism in pain management.人工智能与疼痛管理中的技术解决主义伦理

Pain. 2025 Mar 1;166(3):469-470. doi: 10.1097/j.pain.0000000000003389. Epub 2024 Sep 5.

Diagnostic Accuracy of a Custom Large Language Model on Rare Pediatric Disease Case Reports.定制大语言模型对罕见儿科疾病病例报告的诊断准确性

Am J Med Genet A. 2025 Feb;197(2):e63878. doi: 10.1002/ajmg.a.63878. Epub 2024 Sep 13.

Pilot Study of Large Language Models as an Age-Appropriate Explanatory Tool for Chronic Pediatric Conditions.大型语言模型作为儿童慢性疾病适宜年龄解释工具的初步研究。

medRxiv. 2024 Aug 7:2024.08.06.24311544. doi: 10.1101/2024.08.06.24311544.

本文引用的文献

Proactive Polypharmacy Management Using Large Language Models: Opportunities to Enhance Geriatric Care.使用大型语言模型进行主动药物治疗管理：改善老年护理的机会。

J Med Syst. 2024 Apr 18;48(1):41. doi: 10.1007/s10916-024-02058-y.

Large language models as assistance for glaucoma surgical cases: a ChatGPT vs. Google Gemini comparison.大语言模型作为青光眼手术病例的辅助工具：ChatGPT 与 Google Gemini 的对比。

Graefes Arch Clin Exp Ophthalmol. 2024 Sep;262(9):2945-2959. doi: 10.1007/s00417-024-06470-5. Epub 2024 Apr 4.

How to Navigate the Pitfalls of AI Hype in Health Care.如何避开医疗保健领域人工智能炒作的陷阱。

JAMA. 2024 Jan 23;331(4):273-276. doi: 10.1001/jama.2023.23330.

Empathy and Equity: Key Considerations for Large Language Model Adoption in Health Care.共情与公平：医疗保健中采用大型语言模型的关键考量。

JMIR Med Educ. 2023 Dec 28;9:e51199. doi: 10.2196/51199.

Adopting and expanding ethical principles for generative artificial intelligence from military to healthcare.将生成式人工智能的伦理原则从军事领域应用并扩展至医疗保健领域。

NPJ Digit Med. 2023 Dec 2;6(1):225. doi: 10.1038/s41746-023-00965-x.

The Accuracy and Potential Racial and Ethnic Biases of GPT-4 in the Diagnosis and Triage of Health Conditions: Evaluation Study.GPT-4在健康状况诊断和分诊中的准确性及潜在的种族和民族偏见：评估研究

JMIR Med Educ. 2023 Nov 2;9:e47532. doi: 10.2196/47532.

Large language models propagate race-based medicine.大语言模型传播基于种族的医学观念。

NPJ Digit Med. 2023 Oct 20;6(1):195. doi: 10.1038/s41746-023-00939-z.

Assessing Biases in Medical Decisions via Clinician and AI Chatbot Responses to Patient Vignettes.通过临床医生和人工智能聊天机器人对患者病例的回答评估医疗决策中的偏差

JAMA Netw Open. 2023 Oct 2;6(10):e2338050. doi: 10.1001/jamanetworkopen.2023.38050.

The future landscape of large language models in medicine.医学领域大语言模型的未来前景。

Commun Med (Lond). 2023 Oct 10;3(1):141. doi: 10.1038/s43856-023-00370-1.

Assessing the Utility of ChatGPT Throughout the Entire Clinical Workflow: Development and Usability Study.评估 ChatGPT 在整个临床工作流程中的效用：开发和可用性研究。

J Med Internet Res. 2023 Aug 22;25:e48659. doi: 10.2196/48659.

文献检索

告别复杂PubMed语法，用中文像聊天一样搜索，搜遍4000万医学文献。AI智能推荐，让科研检索更轻松。

立即免费搜索

文件翻译

保留排版，准确专业，支持PDF/Word/PPT等文件格式，支持 12+语言互译。

免费翻译文档

深度研究

AI帮你快速写综述，25分钟生成高质量综述，智能提取关键信息，辅助科研写作。

立即免费体验