• 文献检索
  • 文档翻译
  • 深度研究
  • 学术资讯
  • Suppr Zotero 插件Zotero 插件
  • 邀请有礼
  • 套餐&价格
  • 历史记录
应用&插件
Suppr Zotero 插件Zotero 插件浏览器插件Mac 客户端Windows 客户端微信小程序
定价
高级版会员购买积分包购买API积分包
服务
文献检索文档翻译深度研究API 文档MCP 服务
关于我们
关于 Suppr公司介绍联系我们用户协议隐私条款
关注我们

Suppr 超能文献

核心技术专利:CN118964589B侵权必究
粤ICP备2023148730 号-1Suppr @ 2026

文献检索

告别复杂PubMed语法,用中文像聊天一样搜索,搜遍4000万医学文献。AI智能推荐,让科研检索更轻松。

立即免费搜索

文件翻译

保留排版,准确专业,支持PDF/Word/PPT等文件格式,支持 12+语言互译。

免费翻译文档

深度研究

AI帮你快速写综述,25分钟生成高质量综述,智能提取关键信息,辅助科研写作。

立即免费体验

与单个智能体相比,精心编排的多智能体在临床规模的工作量下能保持准确性。

Orchestrated multi agents sustain accuracy under clinical-scale workloads compared to a single agent.

作者信息

Klang Eyal, Omar Mahmud, Raut Ganesh, Agbareia Reem, Timsina Prem, Freeman Robert, Gavin Nicholas, Stump Lisa, Charney Alexander W, Glicksberg Benjamin S, Nadkarni Girish N

机构信息

The Windreich Department of Artificial Intelligence and Human Health, Mount Sinai Medical Center, NY, USA.

The Division of Data-Driven and Digital Medicine (D3M), Icahn School of Medicine at Mount Sinai, New York, NY, USA.

出版信息

medRxiv. 2025 Aug 24:2025.08.22.25334049. doi: 10.1101/2025.08.22.25334049.

DOI:10.1101/2025.08.22.25334049
PMID:40894146
原文链接:https://pmc.ncbi.nlm.nih.gov/articles/PMC12393657/
Abstract

We tested state-of-the-art large language models (LLMs) in two configurations for clinical-scale workloads: a single agent handling heterogeneous tasks versus an orchestrated multi-agent system assigning each task to a dedicated worker. Across retrieval, extraction, and dosing calculations, we varied batch sizes from 5 to 80 to simulate clinical traffic. Multi-agent runs maintained high accuracy under load (pooled accuracy 90.6% at 5 tasks, 65.3% at 80) while single-agent accuracy fell sharply (73.1% to 16.6%), with significant differences beyond 10 tasks (FDR-adjusted p < 0.01). Multi-agent execution reduced token usage up to 65-fold and limited latency growth compared with single-agent runs. The design's isolation of tasks prevented context interference and preserved performance across four diverse LLM checkpoints. This is the first evaluation of LLM agent architectures under sustained, mixed-task clinical workloads, showing that lightweight orchestration can deliver accuracy, efficiency, and auditability at operational scale.

摘要

我们针对临床规模的工作负载,在两种配置下测试了最先进的大语言模型(LLMs):一种是单个智能体处理异构任务,另一种是精心编排的多智能体系统,将每个任务分配给一个专用工作器。在检索、提取和剂量计算过程中,我们将批量大小从5变化到80,以模拟临床流量。多智能体运行在负载下保持了较高的准确率(5个任务时的综合准确率为90.6%,80个任务时为65.3%),而单个智能体的准确率则大幅下降(从73.1%降至16.6%),在超过10个任务时存在显著差异(FDR校正p < 0.01)。与单个智能体运行相比,多智能体执行将令牌使用量减少了65倍,并限制了延迟增长。该设计对任务的隔离防止了上下文干扰,并在四个不同的大语言模型检查点上保持了性能。这是首次在持续的混合任务临床工作负载下对大语言模型智能体架构进行评估,表明轻量级编排能够在运营规模上实现准确性、效率和可审计性。

https://cdn.ncbi.nlm.nih.gov/pmc/blobs/fa1d/12393657/b81ed207f0ea/nihpp-2025.08.22.25334049v1-f0002.jpg
https://cdn.ncbi.nlm.nih.gov/pmc/blobs/fa1d/12393657/1d0a2356af88/nihpp-2025.08.22.25334049v1-f0001.jpg
https://cdn.ncbi.nlm.nih.gov/pmc/blobs/fa1d/12393657/b81ed207f0ea/nihpp-2025.08.22.25334049v1-f0002.jpg
https://cdn.ncbi.nlm.nih.gov/pmc/blobs/fa1d/12393657/1d0a2356af88/nihpp-2025.08.22.25334049v1-f0001.jpg
https://cdn.ncbi.nlm.nih.gov/pmc/blobs/fa1d/12393657/b81ed207f0ea/nihpp-2025.08.22.25334049v1-f0002.jpg

相似文献

1
Orchestrated multi agents sustain accuracy under clinical-scale workloads compared to a single agent.与单个智能体相比,精心编排的多智能体在临床规模的工作量下能保持准确性。
medRxiv. 2025 Aug 24:2025.08.22.25334049. doi: 10.1101/2025.08.22.25334049.
2
AI Agents in Clinical Medicine: A Systematic Review.临床医学中的人工智能代理:一项系统综述。
medRxiv. 2025 Aug 26:2025.08.22.25334232. doi: 10.1101/2025.08.22.25334232.
3
Prescription of Controlled Substances: Benefits and Risks管制药品的处方:益处与风险
4
Implementing Large Language Models in Health Care: Clinician-Focused Review With Interactive Guideline.在医疗保健中应用大语言模型:以临床医生为重点的回顾与交互式指南
J Med Internet Res. 2025 Jul 11;27:e71916. doi: 10.2196/71916.
5
RAMIE: retrieval-augmented multi-task information extraction with large language models on dietary supplements.RAMIE:基于大语言模型的膳食补充剂检索增强多任务信息提取
J Am Med Inform Assoc. 2025 Mar 1;32(3):545-554. doi: 10.1093/jamia/ocaf002.
6
Sentences, entities, and keyphrases extraction from consumer health forums using multi-task learning.使用多任务学习从消费者健康论坛中提取句子、实体和关键短语。
J Biomed Semantics. 2025 May 6;16(1):8. doi: 10.1186/s13326-025-00329-2.
7
Using a Diverse Test Suite to Assess Large Language Models on Fast Health Care Interoperability Resources Knowledge: Comparative Analysis.使用多样化测试套件在快速医疗保健互操作性资源知识方面评估大语言模型:比较分析
J Med Internet Res. 2025 Aug 12;27:e73540. doi: 10.2196/73540.
8
Advancing health coaching: A comparative study of large language model and health coaches.推进健康辅导:大型语言模型与健康辅导员的比较研究。
Artif Intell Med. 2024 Nov;157:103004. doi: 10.1016/j.artmed.2024.103004. Epub 2024 Oct 19.
9
Detecting Stigmatizing Language in Clinical Notes with Large Language Models for Addiction Care.使用大语言模型在成瘾护理临床记录中检测污名化语言。
medRxiv. 2025 Aug 12:2025.08.08.25333315. doi: 10.1101/2025.08.08.25333315.
10
Evaluating and Improving Syndrome Differentiation Thinking Ability in Large Language Models: Method Development Study.评估和提高大语言模型中的辨证思维能力:方法开发研究
JMIR Med Inform. 2025 Jun 20;13:e75103. doi: 10.2196/75103.

本文引用的文献

1
Retrieval augmented generation for large language models in healthcare: A systematic review.医疗保健领域大语言模型的检索增强生成:一项系统综述。
PLOS Digit Health. 2025 Jun 11;4(6):e0000877. doi: 10.1371/journal.pdig.0000877. eCollection 2025 Jun.
2
The DRAGON benchmark for clinical NLP.临床自然语言处理的DRAGON基准测试。
NPJ Digit Med. 2025 May 17;8(1):289. doi: 10.1038/s41746-025-01626-x.
3
Towards conversational diagnostic artificial intelligence.迈向对话式诊断人工智能。
Nature. 2025 Apr 9. doi: 10.1038/s41586-025-08866-7.
4
Sociodemographic biases in medical decision making by large language models.大语言模型在医疗决策中的社会人口统计学偏差。
Nat Med. 2025 Apr 7. doi: 10.1038/s41591-025-03626-6.
5
A strategy for cost-effective large language model use at health system-scale.一种在卫生系统规模上经济高效使用大语言模型的策略。
NPJ Digit Med. 2024 Nov 18;7(1):320. doi: 10.1038/s41746-024-01315-1.
6
Evaluating large language models as agents in the clinic.评估大型语言模型作为临床中的智能体。
NPJ Digit Med. 2024 Apr 3;7(1):84. doi: 10.1038/s41746-024-01083-y.
7
Large language models encode clinical knowledge.大语言模型编码临床知识。
Nature. 2023 Aug;620(7972):172-180. doi: 10.1038/s41586-023-06291-2. Epub 2023 Jul 12.
8
MIMIC-IV, a freely accessible electronic health record dataset.MIMIC-IV,一个可自由访问的电子健康记录数据集。
Sci Data. 2023 Jan 3;10(1):1. doi: 10.1038/s41597-022-01899-x.