一项关于患者使用带有大语言模型的开放病历的概念验证研究。

A proof-of-concept study for patient use of open notes with large language models.

作者信息

Salmi Liz, Lewis Dana M, Clarke Jennifer L, Dong Zhiyong, Fischmann Rudy, McIntosh Emily I, Sarabu Chethan R, DesRoches Catherine M

机构信息

Department of Women's and Children's Health, Uppsala University, 752 37 Uppsala, Sweden.

OpenNotes, Beth Israel Deaconess Medical Center, Boston, MA 02215, United States.

出版信息

JAMIA Open. 2025 Apr 9;8(2):ooaf021. doi: 10.1093/jamiaopen/ooaf021. eCollection 2025 Apr.

DOI:10.1093/jamiaopen/ooaf021

PMID:40206786

原文链接:https://pmc.ncbi.nlm.nih.gov/articles/PMC11980777/

Abstract

OBJECTIVES

The use of large language models (LLMs) is growing for both clinicians and patients. While researchers and clinicians have explored LLMs to manage patient portal messages and reduce burnout, there is less documentation about how patients use these tools to understand clinical notes and inform decision-making. This proof-of-concept study examined the reliability and accuracy of LLMs in responding to patient queries based on an open visit note.

MATERIALS AND METHODS

In a cross-sectional proof-of-concept study, 3 commercially available LLMs (ChatGPT 4o, Claude 3 Opus, Gemini 1.5) were evaluated using 4 distinct prompt series-, , , and -with multiple questions, designed by patients, in response to a single neuro-oncology progress note. LLM responses were scored by the note author (neuro-oncologist) and a patient who receives care from the note author, using an 8-criterion rubric that assessed , , , , , , , and . Descriptive statistics were used to summarize the performance of each LLM across all prompts.

RESULTS

Overall, the Standard and Persona-based prompt series yielded the best results across all criterion regardless of LLM. Chat-GPT 4o using Persona-based prompts scored highest in all categories. All LLMs scored low in the use of .

DISCUSSION

This proof-of-concept study highlighted the potential for LLMs to assist patients in interpreting open notes. The most effective LLM responses were achieved by applying -style prompts to a patient's question.

CONCLUSION

Optimizing LLMs for patient-driven queries, and patient education and counseling around the use of LLMs, have potential to enhance patient use and understanding of their health information.

摘要

目的

大语言模型（LLMs）在临床医生和患者中的应用日益广泛。虽然研究人员和临床医生已经探索了大语言模型来管理患者门户消息并减轻职业倦怠，但关于患者如何使用这些工具来理解临床记录并为决策提供信息的文献较少。这项概念验证研究基于一份开放式就诊记录，考察了大语言模型在回应患者问题时的可靠性和准确性。

材料与方法

在一项横断面概念验证研究中，使用由患者设计的4个不同的提示系列（、、和）以及多个问题，针对一份单一的神经肿瘤学进展记录，对3个商用大语言模型（ChatGPT 4o、Claude 3 Opus、Gemini 1.5）进行评估。大语言模型的回答由记录作者（神经肿瘤学家）和接受该记录作者治疗的一名患者，使用一个8标准的评分量表进行评分，该量表评估了、、、、、、和。描述性统计用于总结每个大语言模型在所有提示下的表现。

结果

总体而言，无论使用哪种大语言模型，基于标准和角色的提示系列在所有标准下都产生了最佳结果。使用基于角色提示的Chat-GPT 4o在所有类别中得分最高。所有大语言模型在的使用方面得分较低。

讨论

这项概念验证研究突出了大语言模型在帮助患者解释开放式记录方面的潜力。通过将风格的提示应用于患者问题，可获得最有效的大语言模型回答。

结论

针对患者驱动的问题优化大语言模型，以及围绕大语言模型的使用开展患者教育和咨询，有可能提高患者对其健康信息的使用和理解。

https://cdn.ncbi.nlm.nih.gov/pmc/blobs/02eb/11980777/c8e9f8b75200/ooaf021f1.jpg

相似文献

A proof-of-concept study for patient use of open notes with large language models.

JAMIA Open. 2025 Apr 9;8(2):ooaf021. doi: 10.1093/jamiaopen/ooaf021. eCollection 2025 Apr.

Evaluating text and visual diagnostic capabilities of large language models on questions related to the Breast Imaging Reporting and Data System Atlas 5 edition.

Diagn Interv Radiol. 2025 Mar 3;31(2):111-129. doi: 10.4274/dir.2024.242876. Epub 2024 Sep 9.

Accuracy of Large Language Models for Infective Endocarditis Prophylaxis in Dental Procedures.

Int Dent J. 2025 Feb;75(1):206-212. doi: 10.1016/j.identj.2024.09.033. Epub 2024 Oct 12.

Learning to Make Rare and Complex Diagnoses With Generative AI Assistance: Qualitative Study of Popular Large Language Models.

JMIR Med Educ. 2024 Feb 13;10:e51391. doi: 10.2196/51391.

AI in Home Care-Evaluation of Large Language Models for Future Training of Informal Caregivers: Observational Comparative Case Study.

J Med Internet Res. 2025 Apr 28;27:e70703. doi: 10.2196/70703.

Evaluating the Effectiveness of advanced large language models in medical Knowledge: A Comparative study using Japanese national medical examination.

Int J Med Inform. 2025 Jan;193:105673. doi: 10.1016/j.ijmedinf.2024.105673. Epub 2024 Oct 28.

Quality of Answers of Generative Large Language Models Versus Peer Users for Interpreting Laboratory Test Results for Lay Patients: Evaluation Study.

J Med Internet Res. 2024 Apr 17;26:e56655. doi: 10.2196/56655.

Evaluating the Effectiveness of Large Language Models in Providing Patient Education for Chinese Patients With Ocular Myasthenia Gravis: Mixed Methods Study.

J Med Internet Res. 2025 Apr 10;27:e67883. doi: 10.2196/67883.

Is the information provided by large language models valid in educating patients about adolescent idiopathic scoliosis? An evaluation of content, clarity, and empathy : The perspective of the European Spine Study Group.

Spine Deform. 2025 Mar;13(2):361-372. doi: 10.1007/s43390-024-00955-3. Epub 2024 Nov 4.

Evaluating Large Language Models in Dental Anesthesiology: A Comparative Analysis of ChatGPT-4, Claude 3 Opus, and Gemini 1.0 on the Japanese Dental Society of Anesthesiology Board Certification Exam.

Cureus. 2024 Sep 27;16(9):e70302. doi: 10.7759/cureus.70302. eCollection 2024 Sep.

引用本文的文献

How Accurate Is AI? A Critical Evaluation of Commonly Used Large Language Models in Responding to Patient Concerns About Incidental Kidney Tumors.

J Clin Med. 2025 Aug 12;14(16):5697. doi: 10.3390/jcm14165697.

Evaluating performance of large language models for atrial fibrillation management using different prompting strategies and languages.

Sci Rep. 2025 May 30;15(1):19028. doi: 10.1038/s41598-025-04309-5.

本文引用的文献

GPT-4 assistance for improvement of physician performance on patient care tasks: a randomized controlled trial.

Nat Med. 2025 Apr;31(4):1233-1238. doi: 10.1038/s41591-024-03456-y. Epub 2025 Feb 5.

Generative artificial intelligence writing open notes: A mixed methods assessment of the functionality of GPT 3.5 and GPT 4.0.

Digit Health. 2024 Oct 29;10:20552076241291384. doi: 10.1177/20552076241291384. eCollection 2024 Jan-Dec.

Performance of ChatGPT and Google Translate for Pediatric Discharge Instruction Translation.

Pediatrics. 2024 Jul 1;154(1). doi: 10.1542/peds.2023-065573.

Physician Perspectives on Internet-Informed Patients: Systematic Review.

J Med Internet Res. 2024 Jun 6;26:e47620. doi: 10.2196/47620.

Reliability of large language models for advanced head and neck malignancies management: a comparison between ChatGPT 4 and Gemini Advanced.

Eur Arch Otorhinolaryngol. 2024 Sep;281(9):5001-5006. doi: 10.1007/s00405-024-08746-2. Epub 2024 May 25.

National trends in billing patient portal messages as e-visit services in traditional Medicare.

Health Aff Sch. 2024 Apr 3;2(4):qxae040. doi: 10.1093/haschl/qxae040. eCollection 2024 Apr.

Generative artificial intelligence responses to patient messages in the electronic health record: early lessons learned.

JAMIA Open. 2024 Apr 10;7(2):ooae028. doi: 10.1093/jamiaopen/ooae028. eCollection 2024 Jul.

A Nationwide Network of Health AI Assurance Laboratories.

JAMA. 2024 Jan 16;331(3):245-249. doi: 10.1001/jama.2023.26930.

Digital Health to Patient-Facing Artificial Intelligence: Ethical Implications and Threats to Dignity for Patients With Cancer.

JCO Oncol Pract. 2024 Mar;20(3):314-317. doi: 10.1200/OP.23.00412. Epub 2023 Nov 3.

Accuracy and Reliability of Chatbot Responses to Physician Questions.

JAMA Netw Open. 2023 Oct 2;6(10):e2336483. doi: 10.1001/jamanetworkopen.2023.36483.

文献AI研究员

20分钟写一篇综述，助力文献阅读效率提升50倍。

立即体验

用中文搜PubMed

大模型驱动的PubMed中文搜索引擎

马上搜索

文档翻译

学术文献翻译模型，支持多种主流文档格式。

立即体验

一项关于患者使用带有大语言模型的开放病历的概念验证研究。

A proof-of-concept study for patient use of open notes with large language models.

作者信息

机构信息

出版信息

OBJECTIVES

MATERIALS AND METHODS

RESULTS

DISCUSSION

CONCLUSION

目的

材料与方法

结果

讨论

结论

相似文献

引用本文的文献

本文引用的文献

文献AI研究员

用中文搜PubMed

文档翻译

Suppr 超能文献

相似文献

引用本文的文献

本文引用的文献