10种大语言模型的检索增强生成及其在评估医学适用性方面的通用性。

Suppr 超能文献

核心技术专利：CN118964589B侵权必究

Suppr 超能文献

核心技术专利：CN118964589B侵权必究

Ke Yu He, Jin Liyuan, Elangovan Kabilan, Abdullah Hairil Rizal, Liu Nan, Sia Alex Tiong Heng, Soh Chai Rick, Tung Joshua Yi Min, Ong Jasmine Chiat Ling, Kuo Chang-Fu, Wu Shao-Chun, Kovacheva Vesela P, Ting Daniel Shu Wei

Department of Anesthesiology, Singapore General Hospital, Singapore, Singapore.

Data Science and Artificial Intelligence Lab, Singapore General Hospital, Singapore, Singapore.

NPJ Digit Med. 2025 Apr 5;8(1):187. doi: 10.1038/s41746-025-01519-z.

Large Language Models (LLMs) hold promise for medical applications but often lack domain-specific expertise. Retrieval Augmented Generation (RAG) enables customization by integrating specialized knowledge. This study assessed the accuracy, consistency, and safety of LLM-RAG models in determining surgical fitness and delivering preoperative instructions using 35 local and 23 international guidelines. Ten LLMs (e.g., GPT3.5, GPT4, GPT4o, Gemini, Llama2, and Llama3, Claude) were tested across 14 clinical scenarios. A total of 3234 responses were generated and compared to 448 human-generated answers. The GPT4 LLM-RAG model with international guidelines generated answers within 20 s and achieved the highest accuracy, which was significantly better than human-generated responses (96.4% vs. 86.6%, p = 0.016). Additionally, the model exhibited an absence of hallucinations and produced more consistent output than humans. This study underscores the potential of GPT-4-based LLM-RAG models to deliver highly accurate, efficient, and consistent preoperative assessments.

大语言模型（LLMs）在医学应用方面具有潜力，但往往缺乏特定领域的专业知识。检索增强生成（RAG）通过整合专业知识实现定制化。本研究使用35项本地指南和23项国际指南，评估了LLM-RAG模型在确定手术适合性和提供术前指导方面的准确性、一致性和安全性。在14个临床场景中测试了10个大语言模型（如GPT3.5、GPT4、GPT4o、Gemini、Llama2和Llama3、Claude）。总共生成了3234个回答，并与448个人工生成的答案进行比较。使用国际指南的GPT4 LLM-RAG模型在20秒内生成答案，准确率最高，显著优于人工生成的回答（96.4%对86.6%，p = 0.016）。此外，该模型没有出现幻觉，输出比人类更一致。本研究强调了基于GPT-4的LLM-RAG模型在提供高度准确、高效和一致的术前评估方面的潜力。

新学期，新优惠

Suppr 超能文献

新学期，新优惠

Suppr 超能文献

Retrieval augmented generation for 10 large language models and its generalizability in assessing medical fitness.

作者信息

机构信息

出版信息

相似文献

引用本文的文献

本文引用的文献

文献AI研究员

用中文搜PubMed

推荐工具