AlSammarraie Alhasan, Househ Mowafa
Faculty College of Science and Engineering, Hamad Bin Khalifa University, Doha, Qatar1.
Acta Inform Med. 2025;33(1):4-10. doi: 10.5455/aim.2024.33.4-10.
Patient Education is a healthcare concept that involves educating the public with evidence-based medical information. This information surges their capabilities to promote a healthier life and better manage their conditions. LLM platforms have recently been introduced as powerful NLPs capable of producing human-sounding text and by extension patient education materials.
This study aims to conduct a scoping review to systematically map the existing literature on the use of LLMs for generating patient education materials.
The study followed JBI guidelines, searching five databases using set inclusion/exclusion criteria. A RAG-inspired framework was employed to extract the variables followed by a manual check to verify accuracy of extractions. In total, 21 variables were identified and grouped into five themes: Study Demographics, LLM Characteristics, Prompt-Related Variables, PEM Assessment, and Comparative Outcomes.
Results were reported from 69 studies. The United States contributed the largest number of studies. LLM models such as ChatGPT-4, ChatGPT-3.5, and Bard were the most investigated. Most studies evaluated the accuracy of LLM responses and the readability of LLM responses. Only 3 studies implemented external knowledge bases leveraging a RAG architecture. All studies except 3 conducted prompting in English. ChatGPT-4 was found to provide the most accurate responses in comparison with other models.
This review examined studies comparing large language models for generating patient education materials. ChatGPT-3.5 and ChatGPT-4 were the most evaluated. Accuracy and readability of responses were the main metrics of evaluation, while few studies used assessment frameworks, retrieval-augmented methods, or explored non-English cases.
患者教育是一种医疗保健理念,涉及用循证医学信息对公众进行教育。这些信息增强了他们促进更健康生活和更好管理自身病情的能力。大语言模型(LLM)平台最近作为强大的自然语言处理工具被引入,能够生成类似人类的文本,进而生成患者教育材料。
本研究旨在进行一项范围综述,以系统梳理关于使用大语言模型生成患者教育材料的现有文献。
本研究遵循循证卫生保健国际协作网(JBI)指南,使用设定的纳入/排除标准搜索五个数据库。采用一种受检索、生成、优化(RAG)启发的框架来提取变量,随后进行人工检查以验证提取的准确性。总共识别出21个变量,并将其分为五个主题:研究人口统计学、大语言模型特征、提示相关变量、患者教育材料评估和比较结果。
69项研究报告了结果。美国的研究数量最多。ChatGPT-4、ChatGPT-3.5和Bard等大语言模型是研究最多的。大多数研究评估了大语言模型回答的准确性和可读性。只有3项研究利用RAG架构实施了外部知识库。除3项研究外,所有研究均以英语进行提示。与其他模型相比,ChatGPT-4被发现能提供最准确的回答。
本综述审视了比较用于生成患者教育材料的大语言模型的研究。ChatGPT-3.5和ChatGPT-4是评估最多的。回答的准确性和可读性是主要评估指标,而很少有研究使用评估框架、检索增强方法或探索非英语案例。