文献检索，用中文搜 PubMed

应用&插件

Zotero 插件浏览器插件 Mac 客户端 Windows 客户端微信小程序

定价

高级版会员购买积分包购买API积分包

服务

文献检索文档翻译深度研究 API 文档 MCP 服务

关于我们

关于 Suppr 公司介绍联系我们用户协议隐私条款

关注我们

Suppr 超能文献

核心技术专利：CN118964589B侵权必究

粤ICP备2023148730 号-1Suppr @ 2026

BACKGROUND

AI agents built on large language models (LLMs) can plan tasks, use external tools, and coordinate with other agents. Unlike standard LLMs, agents can execute multi-step processes, access real-time clinical information, and integrate multiple data sources. There has been interest in using such agents for clinical and administrative tasks, however, there is limited knowledge on their performance and whether multi-agent systems function better than a single agent for healthcare tasks.

PURPOSE

To evaluate the performance of AI agents in healthcare, compare AI agent systems vs. standard LLMs and catalog the tools used for task completion.

DATA SOURCES

PubMed, Web of Science, and Scopus from October 1, 2022, through August 5, 2025.

STUDY SELECTION

Peer-reviewed studies implementing AI agents for clinical tasks with quantitative performance comparisons.

DATA EXTRACTION

Two reviewers (A.G., M.O.) independently extracted data on architectures, performance metrics, and clinical applications. Discrepancies were resolved by discussion, with a third reviewer (E.K.) consulted when consensus could not be reached.

DATA SYNTHESIS

Twenty studies met inclusion criteria. Across studies, all agent systems outperformed their baseline LLMs in accuracy performance. Improvements ranged from small gains to increases of over 60 percentage points, with a median improvement of 53 percentage points in single-agent tool-calling studies. These systems were particularly effective for discrete tasks such as medication dosing and evidence retrieval. Multi-agent systems showed optimal performance with up to 5 agents, and their effectiveness was particularly pronounced when dealing with highly complex tasks. The highest performance boost occurred when the complexity of the AI agent framework aligned with that of the task.

LIMITATIONS

Heterogeneous outcomes precluded quantitative meta-analysis. Several studies relied on synthetic data, limiting generalizability.

CONCLUSIONS

AI agents consistently improve clinical task performance of Base-LLMs when architecture matches task complexity. Our analysis indicates a step-change over base-LLMs, with AI agents opening previously inaccessible domains. Future efforts should be based on prospective, multi-center trials using real-world data to determine safety, task matched and cost-effectiveness.

PRIMARY FUNDING SOURCE

This work was supported in part through the computational and data resources and staff expertise provided by Scientific Computing and Data at the Icahn School of Medicine at Mount Sinai and supported by the Clinical and Translational Science Awards (CTSA) grant UL1TR004419 from the National Center for Advancing Translational Sciences. Research reported in this publication was also supported by the Office of Research Infrastructure of the National Institutes of Health under award number S10OD026880 and S10OD030463. The content is solely the responsibility of the authors and does not necessarily represent the official views of the National Institutes of Health.

REGISTRATION

PROSPERO CRD420251120318.

Suppr 超能文献

文献检索

文件翻译

深度研究

Suppr 超能文献

文献检索

文件翻译

深度研究

临床医学中的人工智能代理：一项系统综述。

AI Agents in Clinical Medicine: A Systematic Review.

作者信息

机构信息

出版信息

BACKGROUND

PURPOSE

DATA SOURCES

STUDY SELECTION

DATA EXTRACTION

DATA SYNTHESIS

LIMITATIONS

CONCLUSIONS

PRIMARY FUNDING SOURCE

REGISTRATION

背景

目的

数据来源

研究选择

数据提取

数据综合

局限性

结论

主要资金来源

注册

相似文献

本文引用的文献

临床医学中的人工智能代理：一项系统综述。

AI Agents in Clinical Medicine: A Systematic Review.

作者信息

机构信息

出版信息

BACKGROUND

PURPOSE

DATA SOURCES

STUDY SELECTION

DATA EXTRACTION

DATA SYNTHESIS

LIMITATIONS

CONCLUSIONS

PRIMARY FUNDING SOURCE

REGISTRATION

背景

目的

数据来源

研究选择

数据提取

数据综合

局限性

结论

主要资金来源

注册

相似文献

本文引用的文献