• 文献检索
  • 文档翻译
  • 深度研究
  • 学术资讯
  • Suppr Zotero 插件Zotero 插件
  • 邀请有礼
  • 套餐&价格
  • 历史记录
应用&插件
Suppr Zotero 插件Zotero 插件浏览器插件Mac 客户端Windows 客户端微信小程序
定价
高级版会员购买积分包购买API积分包
服务
文献检索文档翻译深度研究API 文档MCP 服务
关于我们
关于 Suppr公司介绍联系我们用户协议隐私条款
关注我们

Suppr 超能文献

核心技术专利:CN118964589B侵权必究
粤ICP备2023148730 号-1Suppr @ 2026

文献检索

告别复杂PubMed语法,用中文像聊天一样搜索,搜遍4000万医学文献。AI智能推荐,让科研检索更轻松。

立即免费搜索

文件翻译

保留排版,准确专业,支持PDF/Word/PPT等文件格式,支持 12+语言互译。

免费翻译文档

深度研究

AI帮你快速写综述,25分钟生成高质量综述,智能提取关键信息,辅助科研写作。

立即免费体验

一项关于患者使用带有大语言模型的开放病历的概念验证研究。

A proof-of-concept study for patient use of open notes with large language models.

作者信息

Salmi Liz, Lewis Dana M, Clarke Jennifer L, Dong Zhiyong, Fischmann Rudy, McIntosh Emily I, Sarabu Chethan R, DesRoches Catherine M

机构信息

Department of Women's and Children's Health, Uppsala University, 752 37 Uppsala, Sweden.

OpenNotes, Beth Israel Deaconess Medical Center, Boston, MA 02215, United States.

出版信息

JAMIA Open. 2025 Apr 9;8(2):ooaf021. doi: 10.1093/jamiaopen/ooaf021. eCollection 2025 Apr.

DOI:10.1093/jamiaopen/ooaf021
PMID:40206786
原文链接:https://pmc.ncbi.nlm.nih.gov/articles/PMC11980777/
Abstract

OBJECTIVES

The use of large language models (LLMs) is growing for both clinicians and patients. While researchers and clinicians have explored LLMs to manage patient portal messages and reduce burnout, there is less documentation about how patients use these tools to understand clinical notes and inform decision-making. This proof-of-concept study examined the reliability and accuracy of LLMs in responding to patient queries based on an open visit note.

MATERIALS AND METHODS

In a cross-sectional proof-of-concept study, 3 commercially available LLMs (ChatGPT 4o, Claude 3 Opus, Gemini 1.5) were evaluated using 4 distinct prompt series-, , , and -with multiple questions, designed by patients, in response to a single neuro-oncology progress note. LLM responses were scored by the note author (neuro-oncologist) and a patient who receives care from the note author, using an 8-criterion rubric that assessed , , , , , , , and . Descriptive statistics were used to summarize the performance of each LLM across all prompts.

RESULTS

Overall, the Standard and Persona-based prompt series yielded the best results across all criterion regardless of LLM. Chat-GPT 4o using Persona-based prompts scored highest in all categories. All LLMs scored low in the use of .

DISCUSSION

This proof-of-concept study highlighted the potential for LLMs to assist patients in interpreting open notes. The most effective LLM responses were achieved by applying -style prompts to a patient's question.

CONCLUSION

Optimizing LLMs for patient-driven queries, and patient education and counseling around the use of LLMs, have potential to enhance patient use and understanding of their health information.

摘要

目的

大语言模型(LLMs)在临床医生和患者中的应用日益广泛。虽然研究人员和临床医生已经探索了大语言模型来管理患者门户消息并减轻职业倦怠,但关于患者如何使用这些工具来理解临床记录并为决策提供信息的文献较少。这项概念验证研究基于一份开放式就诊记录,考察了大语言模型在回应患者问题时的可靠性和准确性。

材料与方法

在一项横断面概念验证研究中,使用由患者设计的4个不同的提示系列(、、和)以及多个问题,针对一份单一的神经肿瘤学进展记录,对3个商用大语言模型(ChatGPT 4o、Claude 3 Opus、Gemini 1.5)进行评估。大语言模型的回答由记录作者(神经肿瘤学家)和接受该记录作者治疗的一名患者,使用一个8标准的评分量表进行评分,该量表评估了、、、、、、和。描述性统计用于总结每个大语言模型在所有提示下的表现。

结果

总体而言,无论使用哪种大语言模型,基于标准和角色的提示系列在所有标准下都产生了最佳结果。使用基于角色提示的Chat-GPT 4o在所有类别中得分最高。所有大语言模型在的使用方面得分较低。

讨论

这项概念验证研究突出了大语言模型在帮助患者解释开放式记录方面的潜力。通过将风格的提示应用于患者问题,可获得最有效的大语言模型回答。

结论

针对患者驱动的问题优化大语言模型,以及围绕大语言模型的使用开展患者教育和咨询,有可能提高患者对其健康信息的使用和理解。

https://cdn.ncbi.nlm.nih.gov/pmc/blobs/02eb/11980777/d463c2f8f093/ooaf021f3.jpg
https://cdn.ncbi.nlm.nih.gov/pmc/blobs/02eb/11980777/c8e9f8b75200/ooaf021f1.jpg
https://cdn.ncbi.nlm.nih.gov/pmc/blobs/02eb/11980777/9c6795b7c83a/ooaf021f2.jpg
https://cdn.ncbi.nlm.nih.gov/pmc/blobs/02eb/11980777/d463c2f8f093/ooaf021f3.jpg
https://cdn.ncbi.nlm.nih.gov/pmc/blobs/02eb/11980777/c8e9f8b75200/ooaf021f1.jpg
https://cdn.ncbi.nlm.nih.gov/pmc/blobs/02eb/11980777/9c6795b7c83a/ooaf021f2.jpg
https://cdn.ncbi.nlm.nih.gov/pmc/blobs/02eb/11980777/d463c2f8f093/ooaf021f3.jpg

相似文献

1
A proof-of-concept study for patient use of open notes with large language models.一项关于患者使用带有大语言模型的开放病历的概念验证研究。
JAMIA Open. 2025 Apr 9;8(2):ooaf021. doi: 10.1093/jamiaopen/ooaf021. eCollection 2025 Apr.
2
Evaluating text and visual diagnostic capabilities of large language models on questions related to the Breast Imaging Reporting and Data System Atlas 5 edition.评估大语言模型在与《乳腺影像报告和数据系统》第5版相关问题上的文本和视觉诊断能力。
Diagn Interv Radiol. 2025 Mar 3;31(2):111-129. doi: 10.4274/dir.2024.242876. Epub 2024 Sep 9.
3
Accuracy of Large Language Models for Infective Endocarditis Prophylaxis in Dental Procedures.大型语言模型在牙科手术中预防感染性心内膜炎的准确性。
Int Dent J. 2025 Feb;75(1):206-212. doi: 10.1016/j.identj.2024.09.033. Epub 2024 Oct 12.
4
Learning to Make Rare and Complex Diagnoses With Generative AI Assistance: Qualitative Study of Popular Large Language Models.利用生成式人工智能辅助学习罕见且复杂的诊断:对流行的大型语言模型的定性研究。
JMIR Med Educ. 2024 Feb 13;10:e51391. doi: 10.2196/51391.
5
AI in Home Care-Evaluation of Large Language Models for Future Training of Informal Caregivers: Observational Comparative Case Study.家庭护理中的人工智能——对用于未来非正式护理人员培训的大语言模型的评估:观察性比较案例研究
J Med Internet Res. 2025 Apr 28;27:e70703. doi: 10.2196/70703.
6
Evaluating the Effectiveness of advanced large language models in medical Knowledge: A Comparative study using Japanese national medical examination.评估先进的大型语言模型在医学知识方面的有效性:使用日本国家医学考试的比较研究。
Int J Med Inform. 2025 Jan;193:105673. doi: 10.1016/j.ijmedinf.2024.105673. Epub 2024 Oct 28.
7
Quality of Answers of Generative Large Language Models Versus Peer Users for Interpreting Laboratory Test Results for Lay Patients: Evaluation Study.生成式大语言模型与同行用户对解释非专业患者实验室检测结果的答案质量比较:评估研究。
J Med Internet Res. 2024 Apr 17;26:e56655. doi: 10.2196/56655.
8
Evaluating the Effectiveness of Large Language Models in Providing Patient Education for Chinese Patients With Ocular Myasthenia Gravis: Mixed Methods Study.评估大语言模型为中国重症肌无力性眼病患者提供患者教育的有效性:混合方法研究
J Med Internet Res. 2025 Apr 10;27:e67883. doi: 10.2196/67883.
9
Is the information provided by large language models valid in educating patients about adolescent idiopathic scoliosis? An evaluation of content, clarity, and empathy : The perspective of the European Spine Study Group.大语言模型提供的信息在对患者进行青少年特发性脊柱侧凸教育方面是否有效?内容、清晰度和同理心的评估:欧洲脊柱研究小组的观点
Spine Deform. 2025 Mar;13(2):361-372. doi: 10.1007/s43390-024-00955-3. Epub 2024 Nov 4.
10
Evaluating Large Language Models in Dental Anesthesiology: A Comparative Analysis of ChatGPT-4, Claude 3 Opus, and Gemini 1.0 on the Japanese Dental Society of Anesthesiology Board Certification Exam.评估牙科麻醉学中的大语言模型:ChatGPT-4、Claude 3 Opus和Gemini 1.0在日本麻醉学牙科协会委员会认证考试中的比较分析。
Cureus. 2024 Sep 27;16(9):e70302. doi: 10.7759/cureus.70302. eCollection 2024 Sep.

引用本文的文献

1
How Accurate Is AI? A Critical Evaluation of Commonly Used Large Language Models in Responding to Patient Concerns About Incidental Kidney Tumors.人工智能的准确性如何?对常用大语言模型回应患者对偶然发现的肾肿瘤担忧的批判性评估。
J Clin Med. 2025 Aug 12;14(16):5697. doi: 10.3390/jcm14165697.
2
Evaluating performance of large language models for atrial fibrillation management using different prompting strategies and languages.使用不同的提示策略和语言评估大语言模型在房颤管理方面的性能。
Sci Rep. 2025 May 30;15(1):19028. doi: 10.1038/s41598-025-04309-5.

本文引用的文献

1
GPT-4 assistance for improvement of physician performance on patient care tasks: a randomized controlled trial.GPT-4辅助改善医生在患者护理任务中的表现:一项随机对照试验。
Nat Med. 2025 Apr;31(4):1233-1238. doi: 10.1038/s41591-024-03456-y. Epub 2025 Feb 5.
2
Generative artificial intelligence writing open notes: A mixed methods assessment of the functionality of GPT 3.5 and GPT 4.0.生成式人工智能撰写开放式病历:对GPT 3.5和GPT 4.0功能的混合方法评估
Digit Health. 2024 Oct 29;10:20552076241291384. doi: 10.1177/20552076241291384. eCollection 2024 Jan-Dec.
3
Performance of ChatGPT and Google Translate for Pediatric Discharge Instruction Translation.
ChatGPT 和谷歌翻译在儿科出院医嘱翻译中的性能。
Pediatrics. 2024 Jul 1;154(1). doi: 10.1542/peds.2023-065573.
4
Physician Perspectives on Internet-Informed Patients: Systematic Review.医生视角下的互联网知情患者:系统评价
J Med Internet Res. 2024 Jun 6;26:e47620. doi: 10.2196/47620.
5
Reliability of large language models for advanced head and neck malignancies management: a comparison between ChatGPT 4 and Gemini Advanced.大型语言模型在高级头颈部恶性肿瘤管理中的可靠性:ChatGPT 4 与 Gemini Advanced 之间的比较。
Eur Arch Otorhinolaryngol. 2024 Sep;281(9):5001-5006. doi: 10.1007/s00405-024-08746-2. Epub 2024 May 25.
6
National trends in billing patient portal messages as e-visit services in traditional Medicare.传统医疗保险中将患者门户消息计费为电子就诊服务的全国趋势。
Health Aff Sch. 2024 Apr 3;2(4):qxae040. doi: 10.1093/haschl/qxae040. eCollection 2024 Apr.
7
Generative artificial intelligence responses to patient messages in the electronic health record: early lessons learned.电子健康记录中生成式人工智能对患者信息的回复:早期经验教训
JAMIA Open. 2024 Apr 10;7(2):ooae028. doi: 10.1093/jamiaopen/ooae028. eCollection 2024 Jul.
8
A Nationwide Network of Health AI Assurance Laboratories.全国性的健康人工智能保障实验室网络。
JAMA. 2024 Jan 16;331(3):245-249. doi: 10.1001/jama.2023.26930.
9
Digital Health to Patient-Facing Artificial Intelligence: Ethical Implications and Threats to Dignity for Patients With Cancer.数字健康到面向患者的人工智能:癌症患者的伦理影响和尊严威胁。
JCO Oncol Pract. 2024 Mar;20(3):314-317. doi: 10.1200/OP.23.00412. Epub 2023 Nov 3.
10
Accuracy and Reliability of Chatbot Responses to Physician Questions.聊天机器人对医生提问回答的准确性和可靠性。
JAMA Netw Open. 2023 Oct 2;6(10):e2336483. doi: 10.1001/jamanetworkopen.2023.36483.