• 文献检索
  • 文档翻译
  • 深度研究
  • 学术资讯
  • Suppr Zotero 插件Zotero 插件
  • 邀请有礼
  • 套餐&价格
  • 历史记录
应用&插件
Suppr Zotero 插件Zotero 插件浏览器插件Mac 客户端Windows 客户端微信小程序
定价
高级版会员购买积分包购买API积分包
服务
文献检索文档翻译深度研究API 文档MCP 服务
关于我们
关于 Suppr公司介绍联系我们用户协议隐私条款
关注我们

Suppr 超能文献

核心技术专利:CN118964589B侵权必究
粤ICP备2023148730 号-1Suppr @ 2026

文献检索

告别复杂PubMed语法,用中文像聊天一样搜索,搜遍4000万医学文献。AI智能推荐,让科研检索更轻松。

立即免费搜索

文件翻译

保留排版,准确专业,支持PDF/Word/PPT等文件格式,支持 12+语言互译。

免费翻译文档

深度研究

AI帮你快速写综述,25分钟生成高质量综述,智能提取关键信息,辅助科研写作。

立即免费体验

一种在卫生系统规模上经济高效使用大语言模型的策略。

A strategy for cost-effective large language model use at health system-scale.

作者信息

Klang Eyal, Apakama Donald, Abbott Ethan E, Vaid Akhil, Lampert Joshua, Sakhuja Ankit, Freeman Robert, Charney Alexander W, Reich David, Kraft Monica, Nadkarni Girish N, Glicksberg Benjamin S

机构信息

Division of Data-Driven and Digital Medicine, Department of Medicine, Icahn School of Medicine at Mount Sinai, New York, NY, USA.

The Charles Bronfman Institute of Personalized Medicine, Icahn School of Medicine at Mount Sinai, New York, NY, USA.

出版信息

NPJ Digit Med. 2024 Nov 18;7(1):320. doi: 10.1038/s41746-024-01315-1.

DOI:10.1038/s41746-024-01315-1
PMID:39558090
原文链接:https://pmc.ncbi.nlm.nih.gov/articles/PMC11574261/
Abstract

Large language models (LLMs) can optimize clinical workflows; however, the economic and computational challenges of their utilization at the health system scale are underexplored. We evaluated how concatenating queries with multiple clinical notes and tasks simultaneously affects model performance under increasing computational loads. We assessed ten LLMs of different capacities and sizes utilizing real-world patient data. We conducted >300,000 experiments of various task sizes and configurations, measuring accuracy in question-answering and the ability to properly format outputs. Performance deteriorated as the number of questions and notes increased. High-capacity models, like Llama-3-70b, had low failure rates and high accuracies. GPT-4-turbo-128k was similarly resilient across task burdens, but performance deteriorated after 50 tasks at large prompt sizes. After addressing mitigable failures, these two models can concatenate up to 50 simultaneous tasks effectively, with validation on a public medical question-answering dataset. An economic analysis demonstrated up to a 17-fold cost reduction at 50 tasks using concatenation. These results identify the limits of LLMs for effective utilization and highlight avenues for cost-efficiency at the enterprise scale.

摘要

大语言模型(LLMs)可以优化临床工作流程;然而,在卫生系统规模上使用它们所面临的经济和计算挑战尚未得到充分探索。我们评估了在计算负载增加的情况下,将查询与多个临床记录和任务同时串联起来如何影响模型性能。我们使用真实世界的患者数据评估了十种不同能力和规模的大语言模型。我们进行了超过30万次不同任务规模和配置的实验,测量问答的准确性以及正确格式化输出的能力。随着问题和记录数量的增加,性能会下降。像Llama-3-70b这样的高容量模型故障率低且准确率高。GPT-4-turbo-128k在不同任务负担下同样具有弹性,但在大提示规模下处理50个任务后性能会下降。在解决了可缓解的故障后,这两个模型可以有效地串联多达50个同时进行的任务,并在一个公共医学问答数据集上进行了验证。一项经济分析表明,使用串联方式在处理50个任务时成本可降低多达17倍。这些结果确定了大语言模型有效利用的局限性,并突出了企业规模下实现成本效益的途径。

https://cdn.ncbi.nlm.nih.gov/pmc/blobs/122d/11574261/ca3221906b05/41746_2024_1315_Fig5_HTML.jpg
https://cdn.ncbi.nlm.nih.gov/pmc/blobs/122d/11574261/725b3021f1e4/41746_2024_1315_Fig1_HTML.jpg
https://cdn.ncbi.nlm.nih.gov/pmc/blobs/122d/11574261/e1ffbe2c6cbe/41746_2024_1315_Fig2_HTML.jpg
https://cdn.ncbi.nlm.nih.gov/pmc/blobs/122d/11574261/ab14d29b60ca/41746_2024_1315_Fig3_HTML.jpg
https://cdn.ncbi.nlm.nih.gov/pmc/blobs/122d/11574261/1235d7ef4765/41746_2024_1315_Fig4_HTML.jpg
https://cdn.ncbi.nlm.nih.gov/pmc/blobs/122d/11574261/ca3221906b05/41746_2024_1315_Fig5_HTML.jpg
https://cdn.ncbi.nlm.nih.gov/pmc/blobs/122d/11574261/725b3021f1e4/41746_2024_1315_Fig1_HTML.jpg
https://cdn.ncbi.nlm.nih.gov/pmc/blobs/122d/11574261/e1ffbe2c6cbe/41746_2024_1315_Fig2_HTML.jpg
https://cdn.ncbi.nlm.nih.gov/pmc/blobs/122d/11574261/ab14d29b60ca/41746_2024_1315_Fig3_HTML.jpg
https://cdn.ncbi.nlm.nih.gov/pmc/blobs/122d/11574261/1235d7ef4765/41746_2024_1315_Fig4_HTML.jpg
https://cdn.ncbi.nlm.nih.gov/pmc/blobs/122d/11574261/ca3221906b05/41746_2024_1315_Fig5_HTML.jpg

相似文献

1
A strategy for cost-effective large language model use at health system-scale.一种在卫生系统规模上经济高效使用大语言模型的策略。
NPJ Digit Med. 2024 Nov 18;7(1):320. doi: 10.1038/s41746-024-01315-1.
2
The potential of Generative Pre-trained Transformer 4 (GPT-4) to analyse medical notes in three different languages: a retrospective model-evaluation study.生成式预训练变换器4(GPT-4)分析三种不同语言医学笔记的潜力:一项回顾性模型评估研究。
Lancet Digit Health. 2025 Jan;7(1):e35-e43. doi: 10.1016/S2589-7500(24)00246-2.
3
Data extraction from free-text stroke CT reports using GPT-4o and Llama-3.3-70B: the impact of annotation guidelines.使用GPT-4o和Llama-3.3-70B从自由文本中风CT报告中提取数据:注释指南的影响
Eur Radiol Exp. 2025 Jun 19;9(1):61. doi: 10.1186/s41747-025-00600-2.
4
Use of Large Language Models to Classify Epidemiological Characteristics in Synthetic and Real-World Social Media Posts About Conjunctivitis Outbreaks: Infodemiology Study.利用大语言模型对合成及真实世界社交媒体上有关结膜炎爆发的帖子中的流行病学特征进行分类:信息流行病学研究
J Med Internet Res. 2025 Jul 2;27:e65226. doi: 10.2196/65226.
5
Performance of Large Language Models in Numerical Versus Semantic Medical Knowledge: Cross-Sectional Benchmarking Study on Evidence-Based Questions and Answers.大型语言模型在数值与语义医学知识方面的表现:基于循证问答的横断面基准研究
J Med Internet Res. 2025 Jul 14;27:e64452. doi: 10.2196/64452.
6
Falls prevention interventions for community-dwelling older adults: systematic review and meta-analysis of benefits, harms, and patient values and preferences.社区居住的老年人跌倒预防干预措施:系统评价和荟萃分析的益处、危害以及患者的价值观和偏好。
Syst Rev. 2024 Nov 26;13(1):289. doi: 10.1186/s13643-024-02681-3.
7
Large Language Models and Empathy: Systematic Review.大语言模型与同理心:系统综述
J Med Internet Res. 2024 Dec 11;26:e52597. doi: 10.2196/52597.
8
Language Models for Multilabel Document Classification of Surgical Concepts in Exploratory Laparotomy Operative Notes: Algorithm Development Study.用于探索性剖腹手术记录中手术概念多标签文档分类的语言模型:算法开发研究
JMIR Med Inform. 2025 Jul 9;13:e71176. doi: 10.2196/71176.
9
Maternal and neonatal outcomes of elective induction of labor.择期引产的母婴结局
Evid Rep Technol Assess (Full Rep). 2009 Mar(176):1-257.
10
Diagnostic test accuracy and cost-effectiveness of tests for codeletion of chromosomal arms 1p and 19q in people with glioma.染色体臂 1p 和 19q 缺失的检测在胶质瘤患者中的诊断准确性和成本效益。
Cochrane Database Syst Rev. 2022 Mar 2;3(3):CD013387. doi: 10.1002/14651858.CD013387.pub2.

引用本文的文献

1
Orchestrated multi agents sustain accuracy under clinical-scale workloads compared to a single agent.与单个智能体相比,精心编排的多智能体在临床规模的工作量下能保持准确性。
medRxiv. 2025 Aug 24:2025.08.22.25334049. doi: 10.1101/2025.08.22.25334049.
2
Clinical and economic impact of a large language model in perioperative medicine: a randomized crossover trial.大语言模型在围手术期医学中的临床和经济影响:一项随机交叉试验
NPJ Digit Med. 2025 Jul 21;8(1):462. doi: 10.1038/s41746-025-01858-x.
3
Generative AI in hepatology: Transforming multimodal patient-generated data into actionable insights.

本文引用的文献

1
A comparative study of large language model-based zero-shot inference and task-specific supervised classification of breast cancer pathology reports.基于大语言模型的零样本推理与乳腺癌病理报告任务特定监督分类的比较研究。
J Am Med Inform Assoc. 2024 Oct 1;31(10):2315-2327. doi: 10.1093/jamia/ocae146.
2
Evaluating the accuracy of a state-of-the-art large language model for prediction of admissions from the emergency room.评估最先进的大型语言模型在预测急诊入院方面的准确性。
J Am Med Inform Assoc. 2024 Sep 1;31(9):1921-1928. doi: 10.1093/jamia/ocae103.
3
Use of a Large Language Model to Assess Clinical Acuity of Adults in the Emergency Department.
肝病学中的生成式人工智能:将多模式患者生成的数据转化为可操作的见解。
Hepatol Commun. 2025 Jul 14;9(8). doi: 10.1097/HC9.0000000000000683. eCollection 2025 Aug 1.
4
A perspective for adapting generalist AI to specialized medical AI applications and their challenges.将通用人工智能应用于专业医学人工智能应用的前景及其挑战。
NPJ Digit Med. 2025 Jul 11;8(1):429. doi: 10.1038/s41746-025-01789-7.
5
Integration of Generative AI with Human Expertise in HEOR: A Hybrid Intelligence Framework.生成式人工智能与药物卫生技术评估中的人类专业知识的整合:一种混合智能框架。
Adv Ther. 2025 Jun 25. doi: 10.1007/s12325-025-03273-w.
6
Large language models in oncology: a review.肿瘤学中的大语言模型:综述
BMJ Oncol. 2025 May 15;4(1):e000759. doi: 10.1136/bmjonc-2025-000759. eCollection 2025.
7
Rethinking clinical trials for medical AI with dynamic deployments of adaptive systems.通过自适应系统的动态部署对医学人工智能的临床试验进行重新思考。
NPJ Digit Med. 2025 May 6;8(1):252. doi: 10.1038/s41746-025-01674-3.
8
Development and validation of the provider documentation summarization quality instrument for large language models.大型语言模型的提供者文档摘要质量工具的开发与验证
J Am Med Inform Assoc. 2025 Jun 1;32(6):1050-1060. doi: 10.1093/jamia/ocaf068.
9
"The Machine Will See You Now": A Clinician's Perspective on Artificial "Intelligence" In Clinical Care.“机器现在将为您服务”:临床医生对临床护理中人工智能的看法。
Mov Disord Clin Pract. 2025 May;12(5):588-591. doi: 10.1002/mdc3.70054. Epub 2025 Mar 20.
10
Benchmarking the diagnostic performance of open source LLMs in 1933 Eurorad case reports.在1933份欧洲放射学会病例报告中对开源语言模型的诊断性能进行基准测试。
NPJ Digit Med. 2025 Feb 12;8(1):97. doi: 10.1038/s41746-025-01488-3.
使用大型语言模型评估急诊科成人的临床敏锐度。
JAMA Netw Open. 2024 May 1;7(5):e248895. doi: 10.1001/jamanetworkopen.2024.8895.
4
The effect of using a large language model to respond to patient messages.使用大语言模型回复患者信息的效果。
Lancet Digit Health. 2024 Jun;6(6):e379-e381. doi: 10.1016/S2589-7500(24)00060-8. Epub 2024 Apr 24.
5
Distilling large language models for matching patients to clinical trials.提炼大型语言模型以实现患者与临床试验的匹配。
J Am Med Inform Assoc. 2024 Sep 1;31(9):1953-1963. doi: 10.1093/jamia/ocae073.
6
Generalizing Parkinson's disease detection using keystroke dynamics: a self-supervised approach.使用击键动力学进行帕金森病检测的泛化:一种自监督方法。
J Am Med Inform Assoc. 2024 May 20;31(6):1239-1246. doi: 10.1093/jamia/ocae050.
7
Generative Artificial Intelligence to Transform Inpatient Discharge Summaries to Patient-Friendly Language and Format.生成式人工智能将住院病历摘要转换为患者友好型语言和格式。
JAMA Netw Open. 2024 Mar 4;7(3):e240357. doi: 10.1001/jamanetworkopen.2024.0357.
8
Adapted large language models can outperform medical experts in clinical text summarization.经过改编的大型语言模型在临床文本总结方面的表现优于医学专家。
Nat Med. 2024 Apr;30(4):1134-1142. doi: 10.1038/s41591-024-02855-5. Epub 2024 Feb 27.
9
Large language models streamline automated machine learning for clinical studies.大型语言模型简化了临床研究的自动化机器学习。
Nat Commun. 2024 Feb 21;15(1):1603. doi: 10.1038/s41467-024-45879-8.
10
Efficient healthcare with large language models: optimizing clinical workflow and enhancing patient care.大语言模型在医疗保健中的高效应用:优化临床工作流程,提升患者护理水平。
J Am Med Inform Assoc. 2024 May 20;31(6):1436-1440. doi: 10.1093/jamia/ocad258.