• 文献检索
  • 文档翻译
  • 深度研究
  • 学术资讯
  • Suppr Zotero 插件Zotero 插件
  • 邀请有礼
  • 套餐&价格
  • 历史记录
应用&插件
Suppr Zotero 插件Zotero 插件浏览器插件Mac 客户端Windows 客户端微信小程序
定价
高级版会员购买积分包购买API积分包
服务
文献检索文档翻译深度研究API 文档MCP 服务
关于我们
关于 Suppr公司介绍联系我们用户协议隐私条款
关注我们

Suppr 超能文献

核心技术专利:CN118964589B侵权必究
粤ICP备2023148730 号-1Suppr @ 2026

文献检索

告别复杂PubMed语法,用中文像聊天一样搜索,搜遍4000万医学文献。AI智能推荐,让科研检索更轻松。

立即免费搜索

文件翻译

保留排版,准确专业,支持PDF/Word/PPT等文件格式,支持 12+语言互译。

免费翻译文档

深度研究

AI帮你快速写综述,25分钟生成高质量综述,智能提取关键信息,辅助科研写作。

立即免费体验

使用由大语言模型驱动的多智能体系统优化医嘱集

Optimizing Order Sets With a Large Language Model-Powered Multiagent System.

作者信息

Liu Siru, Huang Sean S, McCoy Allison B, Wright Aileen P, Horst Sara, Wright Adam

机构信息

Department of Biomedical Informatics, Vanderbilt University Medical Center, Nashville, Tennessee.

Department of Computer Science, Vanderbilt University, Nashville, Tennessee.

出版信息

JAMA Netw Open. 2025 Sep 2;8(9):e2533277. doi: 10.1001/jamanetworkopen.2025.33277.

DOI:10.1001/jamanetworkopen.2025.33277
PMID:40986301
原文链接:https://pmc.ncbi.nlm.nih.gov/articles/PMC12457977/
Abstract

IMPORTANCE

Optimizing order sets is vital to enhance clinical decision support and improve patient care. Manual review is resource intensive and cannot timely identify potential improvements in order sets.

OBJECTIVE

To develop and evaluate the utility of a large language model (LLM)-powered multiagent system in optimizing order sets.

DESIGN, SETTING, AND PARTICIPANTS: A multiagent system was developed and evaluated between January 1, 2024, and December 31, 2024, which comprised agents for content critique, dynamic search, knowledge retrieval, medication verification, and suggestion summarization. A filter was developed to align suggestion usefulness scores with expert preferences. Experiment 1 evaluated 735 generated suggestions from a multiagent system developed for optimizing order sets, which were assessed by 3 physicians for 9 order sets and by 1 physician for 62 order sets. Experiment 2 implemented an LLM-as-a-judge approach to align generated suggestions with expert ratings and developed a filter to further refine the system's performance. The study was performed at Vanderbilt University Medical Center. A total of 735 suggestions for 71 order sets at VUMC were evaluated by 3 physicians.

MAIN OUTCOMES AND MEASURES

The ratings of accuracy, usefulness, feasibility, and impact; interrater agreement; and alignment against historical ordering data.

RESULTS

In evaluation 1 of experiment 1, the median values for the number of suggestions scoring 4 or higher at the order set level were 5 (IQR, 5-6) for the metrics of accuracy, 2 (IQR, 1-4) for usefulness, 1 (IQR, 0-3) for feasibility, and 1 (IQR, 0-2) for impact. Of 96 suggestions, 44 (46%; 95% CI, 36%-56%) aligned with historical ordering patterns. In evaluation 2 of experiment 1, 639 suggestions were generated for 62 order sets; 52 order sets had at least 1 useful suggestion, with a median of 2 (IQR, 1-3) useful suggestions. Overall, 122 suggestions (19%; 95% CI, 16%-22%) were rated as useful. After expert alignment, Cohen κ improved from 0.06 to 0.41. Filtering using the aligned scores reduced total suggestions by 29% while retaining 92% of useful suggestions.

CONCLUSIONS AND RELEVANCE

In this cohort study of an LLM-powered multiagent system for optimizing order sets, leveraging LLMs and multiagent systems provided a scalable approach. Alignment with a small set of expert ratings significantly enhanced the LLM evaluation. Future research could refine reasoning capabilities and integrate useful suggestions into electronic health records, while engaging end-users as artificial intelligence-supported reviewers.

摘要

重要性

优化医嘱集对于加强临床决策支持和改善患者护理至关重要。人工审查资源消耗大,且无法及时识别医嘱集中的潜在改进之处。

目的

开发并评估一个由大语言模型(LLM)驱动的多智能体系统在优化医嘱集中的效用。

设计、设置和参与者:在2024年1月1日至2024年12月31日期间开发并评估了一个多智能体系统,该系统包括用于内容批判、动态搜索、知识检索、用药验证和建议总结的智能体。开发了一个过滤器,以使建议有用性得分与专家偏好相一致。实验1评估了为优化医嘱集而开发的多智能体系统生成的735条建议,这些建议由3名医生针对9个医嘱集进行评估,1名医生针对62个医嘱集进行评估。实验2采用LLM作为评判的方法,使生成的建议与专家评级相一致,并开发了一个过滤器以进一步优化系统性能。该研究在范德比尔特大学医学中心进行。3名医生对范德比尔特大学医学中心71个医嘱集的735条建议进行了评估。

主要结局和指标

准确性、有用性、可行性和影响的评级;评分者间一致性;以及与历史医嘱数据的一致性。

结果

在实验1的评估1中,在医嘱集层面上得分4分及以上的建议数量的中位数,准确性指标为5条(四分位距,5 - 6),有用性指标为2条(四分位距,1 - 4),可行性指标为1条(四分位距,0 - 3),影响指标为1条(四分位距,0 - 2)。在96条建议中,44条(46%;95%置信区间,36% - 56%)与历史医嘱模式一致。在实验1的评估2中,为62个医嘱集生成了639条建议;52个医嘱集至少有1条有用建议,有用建议的中位数为2条(四分位距,1 - 3)。总体而言,122条建议(19%;95%置信区间,16% - 22%)被评为有用。经过专家校准后,科恩κ系数从0.06提高到了0.41。使用校准后的分数进行过滤,使总建议数减少了29%,同时保留了92%的有用建议。

结论与意义

在这项关于由LLM驱动的优化医嘱集多智能体系统的队列研究中,利用LLM和多智能体系统提供了一种可扩展的方法。与一小部分专家评级相一致显著增强了LLM评估。未来的研究可以完善推理能力,并将有用建议整合到电子健康记录中,同时让终端用户作为人工智能支持的审查者参与进来。

https://cdn.ncbi.nlm.nih.gov/pmc/blobs/9d16/12457977/3f3cbdd5bb83/jamanetwopen-e2533277-g003.jpg
https://cdn.ncbi.nlm.nih.gov/pmc/blobs/9d16/12457977/4009c0ec38f4/jamanetwopen-e2533277-g001.jpg
https://cdn.ncbi.nlm.nih.gov/pmc/blobs/9d16/12457977/f9aff6a767a6/jamanetwopen-e2533277-g002.jpg
https://cdn.ncbi.nlm.nih.gov/pmc/blobs/9d16/12457977/3f3cbdd5bb83/jamanetwopen-e2533277-g003.jpg
https://cdn.ncbi.nlm.nih.gov/pmc/blobs/9d16/12457977/4009c0ec38f4/jamanetwopen-e2533277-g001.jpg
https://cdn.ncbi.nlm.nih.gov/pmc/blobs/9d16/12457977/f9aff6a767a6/jamanetwopen-e2533277-g002.jpg
https://cdn.ncbi.nlm.nih.gov/pmc/blobs/9d16/12457977/3f3cbdd5bb83/jamanetwopen-e2533277-g003.jpg

相似文献

1
Optimizing Order Sets With a Large Language Model-Powered Multiagent System.使用由大语言模型驱动的多智能体系统优化医嘱集
JAMA Netw Open. 2025 Sep 2;8(9):e2533277. doi: 10.1001/jamanetworkopen.2025.33277.
2
Development and Validation of a Large Language Model-Powered Chatbot for Neurosurgery: Mixed Methods Study on Enhancing Perioperative Patient Education.用于神经外科手术的基于大语言模型的聊天机器人的开发与验证:关于加强围手术期患者教育的混合方法研究
J Med Internet Res. 2025 Jul 15;27:e74299. doi: 10.2196/74299.
3
Comparative Evaluation of a Medical Large Language Model in Answering Real-World Radiation Oncology Questions: Multicenter Observational Study.
J Med Internet Res. 2025 Sep 23;27:e69752. doi: 10.2196/69752.
4
Mid Forehead Brow Lift额中眉提升术
5
Large Language Models' Clinical Decision-Making on When to Perform a Kidney Biopsy: Comparative Study.大语言模型关于何时进行肾活检的临床决策:比较研究
J Med Internet Res. 2025 Sep 18;27:e73603. doi: 10.2196/73603.
6
Large Language Model Influence on Diagnostic Reasoning: A Randomized Clinical Trial.大语言模型对诊断推理的影响:一项随机临床试验。
JAMA Netw Open. 2024 Oct 1;7(10):e2440969. doi: 10.1001/jamanetworkopen.2024.40969.
7
Leveraging Retrieval-Augmented Large Language Models for Dietary Recommendations With Traditional Chinese Medicine's Medicine Food Homology: Algorithm Development and Validation.利用检索增强大语言模型结合中医药食同源进行饮食推荐:算法开发与验证
JMIR Med Inform. 2025 Aug 21;13:e75279. doi: 10.2196/75279.
8
Shoulder Arthrogram肩关节造影
9
Signs and symptoms to determine if a patient presenting in primary care or hospital outpatient settings has COVID-19.在基层医疗机构或医院门诊环境中,如果患者出现以下症状和体征,可判断其是否患有 COVID-19。
Cochrane Database Syst Rev. 2022 May 20;5(5):CD013665. doi: 10.1002/14651858.CD013665.pub3.
10
Vesicoureteral Reflux膀胱输尿管反流

本文引用的文献

1
Improving large language model applications in biomedicine with retrieval-augmented generation: a systematic review, meta-analysis, and clinical development guidelines.利用检索增强生成改进生物医学中的大语言模型应用:一项系统综述、荟萃分析和临床开发指南
J Am Med Inform Assoc. 2025 Apr 1;32(4):605-615. doi: 10.1093/jamia/ocaf008.
2
Why do users override alerts? Utilizing large language model to summarize comments and optimize clinical decision support.用户为什么会忽略警报?利用大语言模型总结评论并优化临床决策支持。
J Am Med Inform Assoc. 2024 May 20;31(6):1388-1396. doi: 10.1093/jamia/ocae041.
3
Using AI-generated suggestions from ChatGPT to optimize clinical decision support.
利用 ChatGPT 生成的人工智能建议来优化临床决策支持。
J Am Med Inform Assoc. 2023 Jun 20;30(7):1237-1245. doi: 10.1093/jamia/ocad072.
4
Agent-Based Medical Health Monitoring System.基于代理的医疗健康监测系统。
Sensors (Basel). 2022 Apr 7;22(8):2820. doi: 10.3390/s22082820.
5
A theory-based meta-regression of factors influencing clinical decision support adoption and implementation.基于理论的影响临床决策支持采用和实施因素的元回归分析。
J Am Med Inform Assoc. 2021 Oct 12;28(11):2514-2522. doi: 10.1093/jamia/ocab160.
6
From Ariadne's Thread to the Labyrinth Itself - Nosology and the Infrastructure of Modern Medicine.从阿里阿德涅之线到迷宫本身——疾病分类学与现代医学的基础设施
N Engl J Med. 2020 Mar 26;382(13):1273-1277. doi: 10.1056/NEJMms1913140.
7
Trends in FDA drug approvals over last 2 decades: An observational study.过去20年美国食品药品监督管理局药物批准情况的趋势:一项观察性研究。
J Family Med Prim Care. 2020 Jan 28;9(1):105-114. doi: 10.4103/jfmpc.jfmpc_578_19. eCollection 2020 Jan.
8
Quantifying the competitiveness of the electronic health record market and its implications for interoperability.量化电子健康记录市场的竞争力及其对互操作性的影响。
Int J Med Inform. 2020 Apr;136:104037. doi: 10.1016/j.ijmedinf.2019.104037. Epub 2019 Nov 27.
9
Clinical Practice Guidelines: A Primer on Development and Dissemination.临床实践指南:制定与传播入门
Mayo Clin Proc. 2017 Mar;92(3):423-433. doi: 10.1016/j.mayocp.2017.01.001.
10
Analysis of clinical decision support system malfunctions: a case series and survey.临床决策支持系统故障分析:病例系列研究与调查
J Am Med Inform Assoc. 2016 Nov;23(6):1068-1076. doi: 10.1093/jamia/ocw005. Epub 2016 Mar 28.