一项关于患者使用带有大语言模型的开放病历的概念验证研究。
A proof-of-concept study for patient use of open notes with large language models.
作者信息
Salmi Liz, Lewis Dana M, Clarke Jennifer L, Dong Zhiyong, Fischmann Rudy, McIntosh Emily I, Sarabu Chethan R, DesRoches Catherine M
机构信息
Department of Women's and Children's Health, Uppsala University, 752 37 Uppsala, Sweden.
OpenNotes, Beth Israel Deaconess Medical Center, Boston, MA 02215, United States.
出版信息
JAMIA Open. 2025 Apr 9;8(2):ooaf021. doi: 10.1093/jamiaopen/ooaf021. eCollection 2025 Apr.
OBJECTIVES
The use of large language models (LLMs) is growing for both clinicians and patients. While researchers and clinicians have explored LLMs to manage patient portal messages and reduce burnout, there is less documentation about how patients use these tools to understand clinical notes and inform decision-making. This proof-of-concept study examined the reliability and accuracy of LLMs in responding to patient queries based on an open visit note.
MATERIALS AND METHODS
In a cross-sectional proof-of-concept study, 3 commercially available LLMs (ChatGPT 4o, Claude 3 Opus, Gemini 1.5) were evaluated using 4 distinct prompt series-, , , and -with multiple questions, designed by patients, in response to a single neuro-oncology progress note. LLM responses were scored by the note author (neuro-oncologist) and a patient who receives care from the note author, using an 8-criterion rubric that assessed , , , , , , , and . Descriptive statistics were used to summarize the performance of each LLM across all prompts.
RESULTS
Overall, the Standard and Persona-based prompt series yielded the best results across all criterion regardless of LLM. Chat-GPT 4o using Persona-based prompts scored highest in all categories. All LLMs scored low in the use of .
DISCUSSION
This proof-of-concept study highlighted the potential for LLMs to assist patients in interpreting open notes. The most effective LLM responses were achieved by applying -style prompts to a patient's question.
CONCLUSION
Optimizing LLMs for patient-driven queries, and patient education and counseling around the use of LLMs, have potential to enhance patient use and understanding of their health information.
目的
大语言模型(LLMs)在临床医生和患者中的应用日益广泛。虽然研究人员和临床医生已经探索了大语言模型来管理患者门户消息并减轻职业倦怠,但关于患者如何使用这些工具来理解临床记录并为决策提供信息的文献较少。这项概念验证研究基于一份开放式就诊记录,考察了大语言模型在回应患者问题时的可靠性和准确性。
材料与方法
在一项横断面概念验证研究中,使用由患者设计的4个不同的提示系列(、、和)以及多个问题,针对一份单一的神经肿瘤学进展记录,对3个商用大语言模型(ChatGPT 4o、Claude 3 Opus、Gemini 1.5)进行评估。大语言模型的回答由记录作者(神经肿瘤学家)和接受该记录作者治疗的一名患者,使用一个8标准的评分量表进行评分,该量表评估了、、、、、、和。描述性统计用于总结每个大语言模型在所有提示下的表现。
结果
总体而言,无论使用哪种大语言模型,基于标准和角色的提示系列在所有标准下都产生了最佳结果。使用基于角色提示的Chat-GPT 4o在所有类别中得分最高。所有大语言模型在的使用方面得分较低。
讨论
这项概念验证研究突出了大语言模型在帮助患者解释开放式记录方面的潜力。通过将风格的提示应用于患者问题,可获得最有效的大语言模型回答。
结论
针对患者驱动的问题优化大语言模型,以及围绕大语言模型的使用开展患者教育和咨询,有可能提高患者对其健康信息的使用和理解。
相似文献
引用本文的文献
本文引用的文献
J Med Internet Res. 2024-6-6
Health Aff Sch. 2024-4-3
JAMA. 2024-1-16
JAMA Netw Open. 2023-10-2