加强医患沟通：利用大语言模型在医学诊断中模拟非裔美国黑人英语

Enhancing Patient-Physician Communication: Simulating African American Vernacular English in Medical Diagnostics with Large Language Models.

作者信息

Lee Yeawon, Chang Chia-Hsuan, Yang Christopher C

机构信息

Drexel University, Philadelphia, PA 19104 USA.

Yale University, New Haven, CT 06510 USA.

出版信息

J Healthc Inform Res. 2025 Mar 11;9(2):119-153. doi: 10.1007/s41666-025-00194-9. eCollection 2025 Jun.

DOI:10.1007/s41666-025-00194-9

PMID:40309129

原文链接:https://pmc.ncbi.nlm.nih.gov/articles/PMC12037967/

Abstract

UNLABELLED

Effective communication is crucial in reducing health disparities. However, linguistic differences, such as African American Vernacular English (AAVE), can lead to communication gaps between patients and physicians, negatively affecting care and outcomes. This study examines whether large language models (LLMs), specifically GPT-4 and Llama 3.3, can replicate AAVE in simulated clinical dialogues to improve cultural sensitivity. We tested four prompt types-BaseP, DemoP, LingP, and CompP-using United States Medical Licensing Examination (USMLE) case simulations. Statistical analyses on the models' outputs showed a significant difference among prompt types for both GPT-4 ((2,70) = 6.218, = 0.003) and Llama 3.3 ((2,70) = 12.124, < 0.001), indicating that including demographic information and/or explicit AAVE cues influences each model's output. Combining demographic and linguistic cues (CompP) yielded the highest mean AAVE feature counts (e.g., 9.83 for GPT-4 vs. 16.06 for Llama 3.3), although neither model fully captured the diversity of AAVE. Moreover, simply mentioning African American demographics triggers extra informal forms, suggesting built-in stereotypes or biases in both models. Overall, these findings highlight the promise of LLMs for culturally sensitive healthcare communication, while underscoring the need for continued refinement to address stereotypes and more accurately represent diverse linguistic styles.

SUPPLEMENTARY INFORMATION

The online version contains supplementary material available at 10.1007/s41666-025-00194-9.

摘要

未标注

有效的沟通对于减少健康差距至关重要。然而，语言差异，如非裔美国黑人英语（AAVE），可能导致患者与医生之间的沟通障碍，对医疗护理和结果产生负面影响。本研究探讨了大语言模型（LLMs），特别是GPT - 4和Llama 3.3，是否能够在模拟临床对话中复制AAVE以提高文化敏感性。我们使用美国医学执照考试（USMLE）案例模拟测试了四种提示类型——基础提示（BaseP）、示范提示（DemoP）、语言提示（LingP）和综合提示（CompP）。对模型输出的统计分析表明，GPT - 4（(2,70) = 6.218， = 0.003）和Llama 3.3（(2,70) = 12.124， < 0.001）的提示类型之间存在显著差异，这表明纳入人口统计学信息和/或明确的AAVE线索会影响每个模型的输出。尽管两个模型都没有完全捕捉到AAVE 的多样性，但结合人口统计学和语言线索（CompP）产生了最高的平均AAVE特征计数（例如，GPT - 4为9.83，Llama 3.3为16.06）。此外，仅仅提及非裔美国人的人口统计学特征就会引发额外的非正式形式，这表明两个模型中都存在固有的刻板印象或偏见。总体而言，这些发现凸显了大语言模型在具有文化敏感性的医疗保健沟通方面的前景，同时强调了持续改进以解决刻板印象并更准确地呈现多样语言风格的必要性。