文献检索文档翻译深度研究
Suppr Zotero 插件Zotero 插件
邀请有礼套餐&价格历史记录

新学期,新优惠

限时优惠:9月1日-9月22日

30天高级会员仅需29元

1天体验卡首发特惠仅需5.99元

了解详情
不再提醒
插件&应用
Suppr Zotero 插件Zotero 插件浏览器插件Mac 客户端Windows 客户端微信小程序
高级版
套餐订阅购买积分包
AI 工具
文献检索文档翻译深度研究
关于我们
关于 Suppr公司介绍联系我们用户协议隐私条款
关注我们

Suppr 超能文献

核心技术专利:CN118964589B侵权必究
粤ICP备2023148730 号-1Suppr @ 2025

定制大语言模型提高准确性:将检索增强生成和人工智能代理与非定制模型在循证医学方面进行比较

Custom Large Language Models Improve Accuracy: Comparing Retrieval Augmented Generation and Artificial Intelligence Agents to Noncustom Models for Evidence-Based Medicine.

作者信息

Woo Joshua J, Yang Andrew J, Olsen Reena J, Hasan Sayyida S, Nawabi Danyal H, Nwachukwu Benedict U, Williams Riley J, Ramkumar Prem N

机构信息

Brown University/The Warren Alpert School of Brown University, Providence, Rhode Island, U.S.A.

Tufts University School of Medicine, Boston, Massachusetts, U.S.A.

出版信息

Arthroscopy. 2025 Mar;41(3):565-573.e6. doi: 10.1016/j.arthro.2024.10.042. Epub 2024 Nov 7.


DOI:10.1016/j.arthro.2024.10.042
PMID:39521391
Abstract

PURPOSE: To show the value of custom methods, namely Retrieval Augmented Generation (RAG)-based Large Language Models (LLMs) and Agentic Augmentation, over standard LLMs in delivering accurate information using an anterior cruciate ligament (ACL) injury case. METHODS: A set of 100 questions and answers based on the 2022 AAOS ACL guidelines were curated. Closed-source (open AI GPT4/GPT 3.5 and Anthropic's Claude3) and open-source models (LLama3 8b/70b and Mistral 8×7b) were asked questions in base form and again with AAOS guidelines embedded into a RAG system. The top-performing models were further augmented with artificial intelligence (AI) agents and reevaluated. Two fellowship-trained surgeons blindly evaluated the accuracy of the responses of each cohort. Recall-Oriented Understudy of Gisting Evaluation and Metric for Evaluation of Translation with Explicit Ordering scores were calculated to assess semantic similarity in the response. RESULTS: All noncustom LLM models started below 60% accuracy. Applying RAG improved the accuracy of every model by an average 39.7%. The highest performing model with just RAG was Meta's open-source Llama3 70b (94%). The highest performing model with RAG and AI agents was Open AI's GPT4 (95%). CONCLUSIONS: RAG improved accuracy by an average of 39.7%, with the highest accuracy rate of 94% in the Meta Llama3 70b. Incorporating AI agents into a previously RAG-augmented LLM improved ChatGPT4 accuracy rate to 95%. Thus, Agentic and RAG augmented LLMs can be accurate liaisons of information, supporting our hypothesis. CLINICAL RELEVANCE: Despite literature surrounding the use of LLM in medicine, there has been considerable and appropriate skepticism given the variably accurate response rates. This study establishes the groundwork to identify whether custom modifications to LLMs using RAG and agentic augmentation can better deliver accurate information in orthopaedic care. With this knowledge, online medical information commonly sought in popular LLMs, such as ChatGPT, can be standardized and provide relevant online medical information to better support shared decision making between surgeon and patient.

摘要

目的:通过前交叉韧带(ACL)损伤病例,展示定制方法(即基于检索增强生成(RAG)的大语言模型(LLMs)和智能体增强)相较于标准大语言模型在提供准确信息方面的价值。 方法:精心整理了一组基于2022年美国骨科学会(AAOS)ACL指南的100个问答。向闭源模型(OpenAI GPT4/GPT 3.5和Anthropic的Claude3)和开源模型(Llama3 8b/70b和Mistral 8×7b)以基本形式提问,然后再将AAOS指南嵌入RAG系统后提问。对表现最佳的模型进一步用人工智能(AI)智能体增强并重新评估。两名经过专科培训的外科医生对每个队列的回答准确性进行盲评。计算面向召回的gist评估替代指标和显式排序翻译评估指标得分,以评估回答中的语义相似度。 结果:所有非定制大语言模型的准确率开始时均低于60%。应用RAG使每个模型的准确率平均提高了39.7%。仅使用RAG时表现最佳的模型是Meta的开源Llama3 70b(94%)。使用RAG和AI智能体时表现最佳的模型是OpenAI的GPT4(95%)。 结论:RAG使准确率平均提高了39.7%,Meta Llama3 70b的准确率最高,为94%。将AI智能体整合到先前经RAG增强的大语言模型中,使ChatGPT4的准确率提高到了95%。因此,智能体增强和RAG增强的大语言模型可以成为准确的信息联络工具,支持我们的假设。 临床意义:尽管有关于大语言模型在医学中应用的文献,但鉴于回答准确率参差不齐,人们一直存在相当多且合理的怀疑态度。本研究为确定使用RAG和智能体增强对大语言模型进行定制修改是否能在骨科护理中更好地提供准确信息奠定了基础。有了这些知识,像ChatGPT这样的流行大语言模型中常见的在线医疗信息可以得到规范,并提供相关的在线医疗信息,以更好地支持外科医生和患者之间的共同决策。

相似文献

[1]
Custom Large Language Models Improve Accuracy: Comparing Retrieval Augmented Generation and Artificial Intelligence Agents to Noncustom Models for Evidence-Based Medicine.

Arthroscopy. 2025-3

[2]
Semantic Clinical Artificial Intelligence vs Native Large Language Model Performance on the USMLE.

JAMA Netw Open. 2025-4-1

[3]
Quality of Answers of Generative Large Language Models Versus Peer Users for Interpreting Laboratory Test Results for Lay Patients: Evaluation Study.

J Med Internet Res. 2024-4-17

[4]
Evaluating retrieval augmented generation and ChatGPT's accuracy on orthopaedic examination assessment questions.

Ann Jt. 2025-4-22

[5]
Assessing Retrieval-Augmented Large Language Model Performance in Emergency Department ICD-10-CM Coding Compared to Human Coders.

medRxiv. 2024-10-17

[6]
Improving accuracy of GPT-3/4 results on biomedical data using a retrieval-augmented language model.

PLOS Digit Health. 2024-8-21

[7]
Optimizing theranostics chatbots with context-augmented large language models.

Theranostics. 2025-4-21

[8]
Use of Retrieval-Augmented Large Language Model for COVID-19 Fact-Checking: Development and Usability Study.

J Med Internet Res. 2025-4-30

[9]
Retrieval augmented generation for 10 large language models and its generalizability in assessing medical fitness.

NPJ Digit Med. 2025-4-5

[10]
Development of a liver disease-specific large language model chat interface using retrieval-augmented generation.

Hepatology. 2024-11-1

引用本文的文献

[1]
AI Agents in Clinical Medicine: A Systematic Review.

medRxiv. 2025-8-26

[2]
Clinical and economic impact of a large language model in perioperative medicine: a randomized crossover trial.

NPJ Digit Med. 2025-7-21

[3]
Retrieval augmented generation for large language models in healthcare: A systematic review.

PLOS Digit Health. 2025-6-11

[4]
Deep Learning in Digital Breast Tomosynthesis: Current Status, Challenges, and Future Trends.

MedComm (2020). 2025-6-9

[5]
Generative Artificial Intelligence and Musculoskeletal Health Care.

HSS J. 2025-4-26

[6]
SurgeryLLM: a retrieval-augmented generation large language model framework for surgical decision support and workflow enhancement.

NPJ Digit Med. 2024-12-18

文献AI研究员

20分钟写一篇综述,助力文献阅读效率提升50倍

立即体验

用中文搜PubMed

大模型驱动的PubMed中文搜索引擎

马上搜索

推荐工具

医学文档翻译智能文献检索