Suppr超能文献

基于具身语言模型的检索增强生成框架用于循证患者教育的开发与评估

Development and evaluation of an agentic LLM based RAG framework for evidence-based patient education.

作者信息

AlSammarraie AlHasan, Al-Saifi Ali, Kamhia Hassan, Aboagla Mohamed, Househ Mowafa

机构信息

Hamad Bin Khalifa University College of Science and Engineering, Doha, Qatar

Applab, Doha, Qatar.

出版信息

BMJ Health Care Inform. 2025 Jul 25;32(1):e101570. doi: 10.1136/bmjhci-2025-101570.

Abstract

OBJECTIVES

To develop and evaluate an agentic retrieval augmented generation (ARAG) framework using open-source large language models (LLMs) for generating evidence-based Arabic patient education materials (PEMs) and assess the LLMs capabilities as validation agents tasked with blocking harmful content.

METHODS

We selected 12 LLMs and applied four experimental setups (base, base+prompt engineering, ARAG, and ARAG+prompt engineering). PEM generation quality was assessed via two-stage evaluation (automated LLM, then expert review) using 5 metrics (accuracy, readability, comprehensiveness, appropriateness and safety) against ground truth. Validation agent (VA) performance was evaluated separately using a harmful/safe PEM dataset, measuring blocking accuracy.

RESULTS

ARAG-enabled setups yielded the best generation performance for 10/12 LLMs. Arabic-focused models occupied the top 9 ranks. Expert evaluation ranking mirrored the automated ranking. AceGPT-v2-32B with ARAG and prompt engineering (setup 4) was confirmed highest-performing. VA accuracy correlated strongly with model size; only models ≥27B parameters achieved >0.80 accuracy. Fanar-7B performed well in generation but poorly as a VA.

DISCUSSION

Arabic-centred models demonstrated advantages for the Arabic PEM generation task. ARAG enhanced generation quality, although context limits impacted large-context models. The validation task highlighted model size as critical for reliable performance.

CONCLUSION

ARAG noticeably improves Arabic PEM generation, particularly with Arabic-centred models like AceGPT-v2-32B. Larger models appear necessary for reliable harmful content validation. Automated evaluation showed potential for ranking systems, aligning with expert judgement for top performers.

摘要

目标

开发并评估一种使用开源大语言模型(LLM)的能动检索增强生成(ARAG)框架,用于生成循证阿拉伯语患者教育材料(PEM),并评估LLM作为负责阻止有害内容的验证代理的能力。

方法

我们选择了12个LLM,并应用了四种实验设置(基础设置、基础设置+提示工程、ARAG和ARAG+提示工程)。通过两阶段评估(自动LLM评估,然后专家评审),使用5个指标(准确性、可读性、全面性、适当性和安全性)与真实情况对比,评估PEM生成质量。使用有害/安全PEM数据集单独评估验证代理(VA)的性能,测量阻止准确性。

结果

启用ARAG的设置对12个LLM中的10个产生了最佳生成性能。以阿拉伯语为重点的模型占据了前9名。专家评估排名反映了自动排名。具有ARAG和提示工程的AceGPT-v2-32B(设置4)被确认为性能最高。VA准确性与模型大小密切相关;只有参数≥27B的模型才能达到>0.80的准确性。Fanar-7B在生成方面表现良好,但作为VA表现不佳。

讨论

以阿拉伯语为中心的模型在阿拉伯语PEM生成任务中显示出优势。ARAG提高了生成质量,尽管上下文限制影响了大上下文模型。验证任务突出了模型大小对于可靠性能的关键作用。

结论

ARAG显著提高了阿拉伯语PEM的生成,特别是使用像AceGPT-v2-32B这样以阿拉伯语为中心的模型。对于可靠的有害内容验证,似乎需要更大的模型。自动评估显示了排名系统的潜力,与顶级性能者的专家判断一致。

https://cdn.ncbi.nlm.nih.gov/pmc/blobs/9bcd/12306375/c96ba55097a8/bmjhci-32-1-g001.jpg

相似文献

1
Development and evaluation of an agentic LLM based RAG framework for evidence-based patient education.
BMJ Health Care Inform. 2025 Jul 25;32(1):e101570. doi: 10.1136/bmjhci-2025-101570.
10
RadioRAG: Online Retrieval-augmented Generation for Radiology Question Answering.
Radiol Artif Intell. 2025 Jun 18:e240476. doi: 10.1148/ryai.240476.

本文引用的文献

1
Conversational health agents: a personalized large language model-powered agent framework.
JAMIA Open. 2025 Jul 6;8(4):ooaf067. doi: 10.1093/jamiaopen/ooaf067. eCollection 2025 Aug.
2
Answering real-world clinical questions using large language model, retrieval-augmented generation, and agentic systems.
Digit Health. 2025 Jun 9;11:20552076251348850. doi: 10.1177/20552076251348850. eCollection 2025 Jan-Dec.
3
The Use of Large Language Models in Generating Patient Education Materials: a Scoping Review.
Acta Inform Med. 2025;33(1):4-10. doi: 10.5455/aim.2024.33.4-10.
4
Large language models in patient education: a scoping review of applications in medicine.
Front Med (Lausanne). 2024 Oct 29;11:1477898. doi: 10.3389/fmed.2024.1477898. eCollection 2024.
10
The Use of Large Language Models to Generate Education Materials about Uveitis.
Ophthalmol Retina. 2024 Feb;8(2):195-201. doi: 10.1016/j.oret.2023.09.008. Epub 2023 Sep 15.

文献AI研究员

20分钟写一篇综述,助力文献阅读效率提升50倍。

立即体验

用中文搜PubMed

大模型驱动的PubMed中文搜索引擎

马上搜索

文档翻译

学术文献翻译模型,支持多种主流文档格式。

立即体验