• 文献检索
  • 文档翻译
  • 深度研究
  • 学术资讯
  • Suppr Zotero 插件Zotero 插件
  • 邀请有礼
  • 套餐&价格
  • 历史记录
应用&插件
Suppr Zotero 插件Zotero 插件浏览器插件Mac 客户端Windows 客户端微信小程序
定价
高级版会员购买积分包购买API积分包
服务
文献检索文档翻译深度研究API 文档MCP 服务
关于我们
关于 Suppr公司介绍联系我们用户协议隐私条款
关注我们

Suppr 超能文献

核心技术专利:CN118964589B侵权必究
粤ICP备2023148730 号-1Suppr @ 2026

文献检索

告别复杂PubMed语法,用中文像聊天一样搜索,搜遍4000万医学文献。AI智能推荐,让科研检索更轻松。

立即免费搜索

文件翻译

保留排版,准确专业,支持PDF/Word/PPT等文件格式,支持 12+语言互译。

免费翻译文档

深度研究

AI帮你快速写综述,25分钟生成高质量综述,智能提取关键信息,辅助科研写作。

立即免费体验

最优大语言模型特性,兼顾准确性和能源使用,以实现可持续医疗应用。

Optimal Large Language Model Characteristics to Balance Accuracy and Energy Use for Sustainable Medical Applications.

机构信息

From the University of Maryland Medical Intelligent Imaging (UM2ii) Center, Department of Radiology and Nuclear Medicine, University of Maryland School of Medicine, 22 S Greene St, Baltimore, MD 21201 (F.X.D., D.S., A.K., P.H.Y., V.S.P.); Department of Radiology, University of Michigan, Ann Arbor, Mich (R.C.C.); and Department of Computer Science and Electrical Engineering, University of Maryland Baltimore County, Baltimore, Md (A.J.).

出版信息

Radiology. 2024 Aug;312(2):e240320. doi: 10.1148/radiol.240320.

DOI:10.1148/radiol.240320
PMID:39189909
原文链接:https://pmc.ncbi.nlm.nih.gov/articles/PMC11366671/
Abstract

Background Large language models (LLMs) for medical applications use unknown amounts of energy, which contribute to the overall carbon footprint of the health care system. Purpose To investigate the tradeoffs between accuracy and energy use when using different LLM types and sizes for medical applications. Materials and Methods This retrospective study evaluated five different billion (B)-parameter sizes of two open-source LLMs (Meta's Llama 2, a general-purpose model, and LMSYS Org's Vicuna 1.5, a specialized fine-tuned model) using chest radiograph reports from the National Library of Medicine's Indiana University Chest X-ray Collection. Reports with missing demographic information and missing or blank files were excluded. Models were run on local compute clusters with visual computing graphic processing units. A single-task prompt explained clinical terminology and instructed each model to confirm the presence or absence of each of the 13 CheXpert disease labels. Energy use (in kilowatt-hours) was measured using an open-source tool. Accuracy was assessed with 13 CheXpert reference standard labels for diagnostic findings on chest radiographs, where overall accuracy was the mean of individual accuracies of all 13 labels. Efficiency ratios (accuracy per kilowatt-hour) were calculated for each model type and size. Results A total of 3665 chest radiograph reports were evaluated. The Vicuna 1.5 7B and 13B models had higher efficiency ratios (737.28 and 331.40, respectively) and higher overall labeling accuracy (93.83% [3438.69 of 3665 reports] and 93.65% [3432.38 of 3665 reports], respectively) than that of the Llama 2 models (7B: efficiency ratio of 13.39, accuracy of 7.91% [289.76 of 3665 reports]; 13B: efficiency ratio of 40.90, accuracy of 74.08% [2715.15 of 3665 reports]; 70B: efficiency ratio of 22.30, accuracy of 92.70% [3397.38 of 3665 reports]). Vicuna 1.5 7B had the highest efficiency ratio (737.28 vs 13.39 for Llama 2 7B). The larger Llama 2 70B model used more than seven times the energy of its 7B counterpart (4.16 kWh vs 0.59 kWh) with low overall accuracy, resulting in an efficiency ratio of only 22.30. Conclusion Smaller fine-tuned LLMs were more sustainable than larger general-purpose LLMs, using less energy without compromising accuracy, highlighting the importance of LLM selection for medical applications. © RSNA, 2024

摘要

背景 用于医学应用的大型语言模型 (LLM) 使用未知数量的能源,这导致了医疗保健系统的整体碳足迹。目的 研究在医学应用中使用不同的 LLM 类型和大小时,在准确性和能源使用之间进行权衡。材料和方法 本回顾性研究评估了两种开源 LLM(Meta 的 Llama 2,一种通用模型,和 LMSYS Org 的 Vicuna 1.5,一种专门的微调模型)的五个不同十亿 (B) 参数大小,使用了来自国家医学图书馆的印第安纳大学 X 射线收藏的胸部 X 光报告。排除了缺少人口统计学信息和缺少或空白文件的报告。模型在带有视觉计算图形处理单元的本地计算集群上运行。单个任务提示解释了临床术语,并指示每个模型确认 13 种 CheXpert 疾病标签中的每一种的存在或不存在。使用开源工具测量能源使用量。使用 13 种 CheXpert 参考标准标签评估准确性,用于胸部 X 光的诊断结果,其中总体准确性是所有 13 种标签的个体准确性的平均值。为每个模型类型和大小计算了效率比(每千瓦时的准确性)。结果 共评估了 3665 份胸部 X 光报告。Vicuna 1.5 7B 和 13B 模型的效率比(分别为 737.28 和 331.40)和总体标记准确性(分别为 93.83%[3438.69 份报告]和 93.65%[3432.38 份报告])高于 Llama 2 模型(7B:效率比为 13.39,准确性为 7.91%[3665 份报告中的 289.76 份];13B:效率比为 40.90,准确性为 74.08%[3665 份报告中的 2715.15 份];70B:效率比为 22.30,准确性为 92.70%[3665 份报告中的 3397.38 份])。Vicuna 1.5 7B 的效率比最高(737.28 与 Llama 2 7B 的 13.39 相比)。较大的 Llama 2 70B 模型的能耗比其 7B 对应模型高出 7 倍以上(4.16 kWh 对 0.59 kWh),整体准确性较低,导致效率比仅为 22.30。结论 较小的微调 LLM 比较大的通用 LLM 更具可持续性,使用更少的能源而不会影响准确性,这突显了在医学应用中选择 LLM 的重要性。

https://cdn.ncbi.nlm.nih.gov/pmc/blobs/30c0/11366671/5e26becd500f/radiol.240320.VA.jpg
https://cdn.ncbi.nlm.nih.gov/pmc/blobs/30c0/11366671/5e26becd500f/radiol.240320.VA.jpg
https://cdn.ncbi.nlm.nih.gov/pmc/blobs/30c0/11366671/5e26becd500f/radiol.240320.VA.jpg

相似文献

1
Optimal Large Language Model Characteristics to Balance Accuracy and Energy Use for Sustainable Medical Applications.最优大语言模型特性,兼顾准确性和能源使用,以实现可持续医疗应用。
Radiology. 2024 Aug;312(2):e240320. doi: 10.1148/radiol.240320.
2
The effect of sample site and collection procedure on identification of SARS-CoV-2 infection.样本采集部位和采集程序对严重急性呼吸综合征冠状病毒2(SARS-CoV-2)感染鉴定的影响。
Cochrane Database Syst Rev. 2024 Dec 16;12(12):CD014780. doi: 10.1002/14651858.CD014780.
3
Thoracic imaging tests for the diagnosis of COVID-19.用于 COVID-19 诊断的胸部影像学检查。
Cochrane Database Syst Rev. 2022 May 16;5(5):CD013639. doi: 10.1002/14651858.CD013639.pub5.
4
Data extraction from free-text stroke CT reports using GPT-4o and Llama-3.3-70B: the impact of annotation guidelines.使用GPT-4o和Llama-3.3-70B从自由文本中风CT报告中提取数据:注释指南的影响
Eur Radiol Exp. 2025 Jun 19;9(1):61. doi: 10.1186/s41747-025-00600-2.
5
Signs and symptoms to determine if a patient presenting in primary care or hospital outpatient settings has COVID-19.在基层医疗机构或医院门诊环境中,如果患者出现以下症状和体征,可判断其是否患有 COVID-19。
Cochrane Database Syst Rev. 2022 May 20;5(5):CD013665. doi: 10.1002/14651858.CD013665.pub3.
6
Rapid, point-of-care antigen tests for diagnosis of SARS-CoV-2 infection.用于 SARS-CoV-2 感染诊断的快速、即时抗原检测。
Cochrane Database Syst Rev. 2022 Jul 22;7(7):CD013705. doi: 10.1002/14651858.CD013705.pub3.
7
Comparison of Two Modern Survival Prediction Tools, SORG-MLA and METSSS, in Patients With Symptomatic Long-bone Metastases Who Underwent Local Treatment With Surgery Followed by Radiotherapy and With Radiotherapy Alone.两种现代生存预测工具 SORG-MLA 和 METSSS 在接受手术联合放疗和单纯放疗治疗有症状长骨转移患者中的比较。
Clin Orthop Relat Res. 2024 Dec 1;482(12):2193-2208. doi: 10.1097/CORR.0000000000003185. Epub 2024 Jul 23.
8
[Volume and health outcomes: evidence from systematic reviews and from evaluation of Italian hospital data].[容量与健康结果:来自系统评价和意大利医院数据评估的证据]
Epidemiol Prev. 2013 Mar-Jun;37(2-3 Suppl 2):1-100.
9
Carbon dioxide detection for diagnosis of inadvertent respiratory tract placement of enterogastric tubes in children.用于诊断儿童肠胃管意外置入呼吸道的二氧化碳检测
Cochrane Database Syst Rev. 2025 Feb 19;2(2):CD011196. doi: 10.1002/14651858.CD011196.pub2.
10
Does the Presence of Missing Data Affect the Performance of the SORG Machine-learning Algorithm for Patients With Spinal Metastasis? Development of an Internet Application Algorithm.缺失数据的存在是否会影响 SORG 机器学习算法在脊柱转移瘤患者中的性能?开发一种互联网应用算法。
Clin Orthop Relat Res. 2024 Jan 1;482(1):143-157. doi: 10.1097/CORR.0000000000002706. Epub 2023 Jun 12.

引用本文的文献

1
Applying Large Language Models for Surgical Case Length Prediction.将大语言模型应用于手术病例时长预测。
JAMA Surg. 2025 Jul 9. doi: 10.1001/jamasurg.2025.2154.
2
Information Extraction from Lumbar Spine MRI Radiology Reports Using GPT4: Accuracy and Benchmarking Against Research-Grade Comprehensive Scoring.使用GPT4从腰椎MRI放射学报告中提取信息:准确性及与研究级综合评分的基准对比
Diagnostics (Basel). 2025 Apr 4;15(7):930. doi: 10.3390/diagnostics15070930.
3
Open-Source Large Language Models in Radiology: A Review and Tutorial for Practical Research and Clinical Deployment.

本文引用的文献

1
Quantitative Evaluation of Large Language Models to Streamline Radiology Report Impressions: A Multimodal Retrospective Analysis.大语言模型在简化放射科报告印象方面的定量评估:一项多模态回顾性分析。
Radiology. 2024 Mar;310(3):e231593. doi: 10.1148/radiol.231593.
2
Adapted large language models can outperform medical experts in clinical text summarization.经过改编的大型语言模型在临床文本总结方面的表现优于医学专家。
Nat Med. 2024 Apr;30(4):1134-1142. doi: 10.1038/s41591-024-02855-5. Epub 2024 Feb 27.
3
Environmental Sustainability and AI in Radiology: A Double-Edged Sword.
放射学中的开源大语言模型:实践研究与临床应用综述及教程
Radiology. 2025 Jan;314(1):e241073. doi: 10.1148/radiol.241073.
环境可持续性与放射学中的人工智能:一把双刃剑。
Radiology. 2024 Feb;310(2):e232030. doi: 10.1148/radiol.232030.
4
Generative Large Language Models for Detection of Speech Recognition Errors in Radiology Reports.生成式大型语言模型在放射科报告语音识别错误检测中的应用。
Radiol Artif Intell. 2024 Mar;6(2):e230205. doi: 10.1148/ryai.230205.
5
General-Purpose Large Language Models Versus a Domain-Specific Natural Language Processing Tool for Label Extraction From Chest Radiograph Reports.通用大语言模型与用于从胸部X光报告中提取标签的特定领域自然语言处理工具的比较
AJR Am J Roentgenol. 2024 Apr;222(4):e2330573. doi: 10.2214/AJR.23.30573. Epub 2024 Jan 17.
6
Economic and Environmental Costs of Cloud Technologies for Medical Imaging and Radiology Artificial Intelligence.医学成像与放射学人工智能中云技术的经济与环境成本
J Am Coll Radiol. 2024 Feb;21(2):248-256. doi: 10.1016/j.jacr.2023.11.011. Epub 2023 Dec 9.
7
Evaluation of Climate-Aware Metrics Tools for Radiology Informatics and Artificial Intelligence: Toward a Potential Radiology Ecolabel.用于放射学信息学和人工智能的气候感知指标工具评估:迈向潜在的放射学生态标签
J Am Coll Radiol. 2024 Feb;21(2):239-247. doi: 10.1016/j.jacr.2023.11.019. Epub 2023 Dec 1.
8
Improving Fairness in AI Models on Electronic Health Records: The Case for Federated Learning Methods.提高电子健康记录人工智能模型的公平性:联邦学习方法的案例
FAccT 23 (2023). 2023 Jun;2023:1599-1608. doi: 10.1145/3593013.3594102. Epub 2023 Jun 12.
9
The future landscape of large language models in medicine.医学领域大语言模型的未来前景。
Commun Med (Lond). 2023 Oct 10;3(1):141. doi: 10.1038/s43856-023-00370-1.
10
Feasibility of Using the Privacy-preserving Large Language Model Vicuna for Labeling Radiology Reports.使用隐私保护的大型语言模型 Vicuna 对放射科报告进行标注的可行性研究。
Radiology. 2023 Oct;309(1):e231147. doi: 10.1148/radiol.231147.