• 文献检索
  • 文档翻译
  • 深度研究
  • 学术资讯
  • Suppr Zotero 插件Zotero 插件
  • 邀请有礼
  • 套餐&价格
  • 历史记录
应用&插件
Suppr Zotero 插件Zotero 插件浏览器插件Mac 客户端Windows 客户端微信小程序
定价
高级版会员购买积分包购买API积分包
服务
文献检索文档翻译深度研究API 文档MCP 服务
关于我们
关于 Suppr公司介绍联系我们用户协议隐私条款
关注我们

Suppr 超能文献

核心技术专利:CN118964589B侵权必究
粤ICP备2023148730 号-1Suppr @ 2026

文献检索

告别复杂PubMed语法,用中文像聊天一样搜索,搜遍4000万医学文献。AI智能推荐,让科研检索更轻松。

立即免费搜索

文件翻译

保留排版,准确专业,支持PDF/Word/PPT等文件格式,支持 12+语言互译。

免费翻译文档

深度研究

AI帮你快速写综述,25分钟生成高质量综述,智能提取关键信息,辅助科研写作。

立即免费体验

大语言模型在生成患者教育材料中的应用:一项范围综述

The Use of Large Language Models in Generating Patient Education Materials: a Scoping Review.

作者信息

AlSammarraie Alhasan, Househ Mowafa

机构信息

Faculty College of Science and Engineering, Hamad Bin Khalifa University, Doha, Qatar1.

出版信息

Acta Inform Med. 2025;33(1):4-10. doi: 10.5455/aim.2024.33.4-10.

DOI:10.5455/aim.2024.33.4-10
PMID:40223858
原文链接:https://pmc.ncbi.nlm.nih.gov/articles/PMC11986337/
Abstract

BACKGROUND

Patient Education is a healthcare concept that involves educating the public with evidence-based medical information. This information surges their capabilities to promote a healthier life and better manage their conditions. LLM platforms have recently been introduced as powerful NLPs capable of producing human-sounding text and by extension patient education materials.

OBJECTIVE

This study aims to conduct a scoping review to systematically map the existing literature on the use of LLMs for generating patient education materials.

METHODS

The study followed JBI guidelines, searching five databases using set inclusion/exclusion criteria. A RAG-inspired framework was employed to extract the variables followed by a manual check to verify accuracy of extractions. In total, 21 variables were identified and grouped into five themes: Study Demographics, LLM Characteristics, Prompt-Related Variables, PEM Assessment, and Comparative Outcomes.

RESULTS

Results were reported from 69 studies. The United States contributed the largest number of studies. LLM models such as ChatGPT-4, ChatGPT-3.5, and Bard were the most investigated. Most studies evaluated the accuracy of LLM responses and the readability of LLM responses. Only 3 studies implemented external knowledge bases leveraging a RAG architecture. All studies except 3 conducted prompting in English. ChatGPT-4 was found to provide the most accurate responses in comparison with other models.

CONCLUSION

This review examined studies comparing large language models for generating patient education materials. ChatGPT-3.5 and ChatGPT-4 were the most evaluated. Accuracy and readability of responses were the main metrics of evaluation, while few studies used assessment frameworks, retrieval-augmented methods, or explored non-English cases.

摘要

背景

患者教育是一种医疗保健理念,涉及用循证医学信息对公众进行教育。这些信息增强了他们促进更健康生活和更好管理自身病情的能力。大语言模型(LLM)平台最近作为强大的自然语言处理工具被引入,能够生成类似人类的文本,进而生成患者教育材料。

目的

本研究旨在进行一项范围综述,以系统梳理关于使用大语言模型生成患者教育材料的现有文献。

方法

本研究遵循循证卫生保健国际协作网(JBI)指南,使用设定的纳入/排除标准搜索五个数据库。采用一种受检索、生成、优化(RAG)启发的框架来提取变量,随后进行人工检查以验证提取的准确性。总共识别出21个变量,并将其分为五个主题:研究人口统计学、大语言模型特征、提示相关变量、患者教育材料评估和比较结果。

结果

69项研究报告了结果。美国的研究数量最多。ChatGPT-4、ChatGPT-3.5和Bard等大语言模型是研究最多的。大多数研究评估了大语言模型回答的准确性和可读性。只有3项研究利用RAG架构实施了外部知识库。除3项研究外,所有研究均以英语进行提示。与其他模型相比,ChatGPT-4被发现能提供最准确的回答。

结论

本综述审视了比较用于生成患者教育材料的大语言模型的研究。ChatGPT-3.5和ChatGPT-4是评估最多的。回答的准确性和可读性是主要评估指标,而很少有研究使用评估框架、检索增强方法或探索非英语案例。

https://cdn.ncbi.nlm.nih.gov/pmc/blobs/1319/11986337/fc8e881f094f/AIM-33-4-g003.jpg
https://cdn.ncbi.nlm.nih.gov/pmc/blobs/1319/11986337/12da957635bb/AIM-33-4-g001.jpg
https://cdn.ncbi.nlm.nih.gov/pmc/blobs/1319/11986337/72ab1fdb85d9/AIM-33-4-g002.jpg
https://cdn.ncbi.nlm.nih.gov/pmc/blobs/1319/11986337/fc8e881f094f/AIM-33-4-g003.jpg
https://cdn.ncbi.nlm.nih.gov/pmc/blobs/1319/11986337/12da957635bb/AIM-33-4-g001.jpg
https://cdn.ncbi.nlm.nih.gov/pmc/blobs/1319/11986337/72ab1fdb85d9/AIM-33-4-g002.jpg
https://cdn.ncbi.nlm.nih.gov/pmc/blobs/1319/11986337/fc8e881f094f/AIM-33-4-g003.jpg

相似文献

1
The Use of Large Language Models in Generating Patient Education Materials: a Scoping Review.大语言模型在生成患者教育材料中的应用:一项范围综述
Acta Inform Med. 2025;33(1):4-10. doi: 10.5455/aim.2024.33.4-10.
2
Proficiency, Clarity, and Objectivity of Large Language Models Versus Specialists' Knowledge on COVID-19's Impacts in Pregnancy: Cross-Sectional Pilot Study.大型语言模型在新冠肺炎对妊娠影响方面的熟练度、清晰度和客观性与专家知识对比:横断面试点研究
JMIR Form Res. 2025 Feb 5;9:e56126. doi: 10.2196/56126.
3
Assessing the Application of Large Language Models in Generating Dermatologic Patient Education Materials According to Reading Level: Qualitative Study.评估大语言模型在根据阅读水平生成皮肤科患者教育材料方面的应用:定性研究。
JMIR Dermatol. 2024 May 16;7:e55898. doi: 10.2196/55898.
4
Large Language Models in Worldwide Medical Exams: Platform Development and Comprehensive Analysis.全球医学考试中的大语言模型:平台开发与综合分析
J Med Internet Res. 2024 Dec 27;26:e66114. doi: 10.2196/66114.
5
Using large language models (ChatGPT, Copilot, PaLM, Bard, and Gemini) in Gross Anatomy course: Comparative analysis.在大体解剖学课程中使用大语言模型(ChatGPT、Copilot、PaLM、Bard和Gemini):比较分析
Clin Anat. 2025 Mar;38(2):200-210. doi: 10.1002/ca.24244. Epub 2024 Nov 21.
6
Learning to Make Rare and Complex Diagnoses With Generative AI Assistance: Qualitative Study of Popular Large Language Models.利用生成式人工智能辅助学习罕见且复杂的诊断:对流行的大型语言模型的定性研究。
JMIR Med Educ. 2024 Feb 13;10:e51391. doi: 10.2196/51391.
7
Assessing the quality and readability of patient education materials on chemotherapy cardiotoxicity from artificial intelligence chatbots: An observational cross-sectional study.评估人工智能聊天机器人提供的关于化疗心脏毒性的患者教育材料的质量和可读性:一项观察性横断面研究。
Medicine (Baltimore). 2025 Apr 11;104(15):e42135. doi: 10.1097/MD.0000000000042135.
8
Benchmarking the performance of large language models in uveitis: a comparative analysis of ChatGPT-3.5, ChatGPT-4.0, Google Gemini, and Anthropic Claude3.葡萄膜炎中大型语言模型性能的基准测试:ChatGPT-3.5、ChatGPT-4.0、谷歌Gemini和Anthropic Claude3的比较分析
Eye (Lond). 2025 Apr;39(6):1132-1137. doi: 10.1038/s41433-024-03545-9. Epub 2024 Dec 17.
9
Evaluating Large Language Models in Dental Anesthesiology: A Comparative Analysis of ChatGPT-4, Claude 3 Opus, and Gemini 1.0 on the Japanese Dental Society of Anesthesiology Board Certification Exam.评估牙科麻醉学中的大语言模型:ChatGPT-4、Claude 3 Opus和Gemini 1.0在日本麻醉学牙科协会委员会认证考试中的比较分析。
Cureus. 2024 Sep 27;16(9):e70302. doi: 10.7759/cureus.70302. eCollection 2024 Sep.
10
Do large language model chatbots perform better than established patient information resources in answering patient questions? A comparative study on melanoma.在回答患者问题方面,大型语言模型聊天机器人的表现是否优于成熟的患者信息资源?一项关于黑色素瘤的比较研究。
Br J Dermatol. 2025 Jan 24;192(2):306-315. doi: 10.1093/bjd/ljae377.

引用本文的文献

1
Large language models in clinical nutrition: an overview of its applications, capabilities, limitations, and potential future prospects.临床营养中的大语言模型:其应用、能力、局限性及潜在未来前景概述
Front Nutr. 2025 Aug 7;12:1635682. doi: 10.3389/fnut.2025.1635682. eCollection 2025.
2
Development and evaluation of an agentic LLM based RAG framework for evidence-based patient education.基于具身语言模型的检索增强生成框架用于循证患者教育的开发与评估
BMJ Health Care Inform. 2025 Jul 25;32(1):e101570. doi: 10.1136/bmjhci-2025-101570.
3
Exploring the possibilities and limitations of customized large language model to support and improve cervical cancer screening.

本文引用的文献

1
Large language models in patient education: a scoping review of applications in medicine.用于患者教育的大语言模型:医学应用的范围综述
Front Med (Lausanne). 2024 Oct 29;11:1477898. doi: 10.3389/fmed.2024.1477898. eCollection 2024.
2
Natural Language Generation and Understanding of Big Code for AI-Assisted Programming: A Review.用于人工智能辅助编程的大代码自然语言生成与理解:综述
Entropy (Basel). 2023 Jun 1;25(6):888. doi: 10.3390/e25060888.
3
Virtual Care, Telemedicine Visits, and Real Connection in the Era of COVID-19: Unforeseen Opportunity in the Face of Adversity.
探索定制大语言模型以支持和改进宫颈癌筛查的可能性与局限性。
BMC Med Inform Decis Mak. 2025 Jul 1;25(1):242. doi: 10.1186/s12911-025-03088-3.
COVID-19 时代的虚拟护理、远程医疗就诊与真实连接:逆境中的意外机遇
JAMA. 2021 Feb 2;325(5):437-438. doi: 10.1001/jama.2020.27304.
4
COVID-19: health literacy is an underestimated problem.新冠疫情:健康素养是一个被低估的问题。
Lancet Public Health. 2020 May;5(5):e249-e250. doi: 10.1016/S2468-2667(20)30086-4. Epub 2020 Apr 14.
5
A new readability yardstick.一种新的可读性衡量标准。
J Appl Psychol. 1948 Jun;32(3):221-33. doi: 10.1037/h0057532.
6
Patient education. American Academy of Family Physicians.
Am Fam Physician. 2000 Oct 1;62(7):1712-4.
7
DISCERN: an instrument for judging the quality of written consumer health information on treatment choices.DISCERN:一种用于评判关于治疗选择的书面消费者健康信息质量的工具。
J Epidemiol Community Health. 1999 Feb;53(2):105-11. doi: 10.1136/jech.53.2.105.