通过与放射学指南的语言模型对齐来评估真实世界患者病例的急性影像检查单开具情况。

Evaluating acute image ordering for real-world patient cases via language model alignment with radiological guidelines.

作者信息

Yao Michael S, Chae Allison, Saraiya Piya, Kahn Charles E, Witschey Walter R, Gee James C, Sagreiya Hersh, Bastani Osbert

机构信息

Department of Bioengineering, University of Pennsylvania, Philadelphia, PA, USA.

Perelman School of Medicine, University of Pennsylvania, Philadelphia, PA, USA.

出版信息

Commun Med (Lond). 2025 Aug 4;5(1):332. doi: 10.1038/s43856-025-01061-9.

DOI:10.1038/s43856-025-01061-9

PMID:40760099

原文链接:https://pmc.ncbi.nlm.nih.gov/articles/PMC12322208/

Abstract

BACKGROUND

Diagnostic imaging studies are increasingly important in the management of acutely presenting patients. However, ordering appropriate imaging studies in the emergency department is a challenging task with a high degree of variability among healthcare providers. To address this issue, recent work has investigated whether generative AI and large language models can be leveraged to recommend diagnostic imaging studies in accordance with evidence-based medical guidelines. However, it remains challenging to ensure that these tools can provide recommendations that correctly align with medical guidelines, especially given the limited diagnostic information available in acute care settings.

METHODS

In this study, we introduce a framework to intelligently leverage language models by recommending imaging studies for patient cases that align with the American College of Radiology's Appropriateness Criteria, a set of evidence-based guidelines. To power our experiments, we introduce RadCases, a dataset of over 1500 annotated case summaries reflecting common patient presentations, and apply our framework to enable state-of-the-art language models to reason about appropriate imaging choices.

RESULTS

Using our framework, state-of-the-art language models achieve accuracy comparable to clinicians in ordering imaging studies. Furthermore, we demonstrate that our language model-based pipeline can be used as an intelligent assistant by clinicians to support image ordering workflows and improve the accuracy of acute image ordering according to the American College of Radiology's Appropriateness Criteria.

CONCLUSIONS

Our work demonstrates and validates a strategy to leverage AI-based software to improve trustworthy clinical decision-making in alignment with expert evidence-based guidelines.

摘要

背景

诊断成像研究在急性病患者的管理中日益重要。然而，在急诊科安排合适的成像检查是一项具有挑战性的任务，医疗服务提供者之间的差异很大。为解决这一问题，最近的研究探讨了是否可以利用生成式人工智能和大语言模型，根据循证医学指南推荐诊断成像检查。然而，要确保这些工具能够提供与医学指南正确相符的建议仍然具有挑战性，尤其是考虑到急性护理环境中可用的诊断信息有限。

方法

在本研究中，我们引入了一个框架，通过为符合美国放射学会适用性标准（一套循证指南）的患者病例推荐成像检查，来智能利用语言模型。为支持我们的实验，我们引入了RadCases，这是一个包含1500多个带注释病例摘要的数据集，反映了常见的患者表现，并应用我们的框架使先进的语言模型能够推断出合适的成像选择。

结果

使用我们的框架，先进的语言模型在安排成像检查方面达到了与临床医生相当的准确率。此外，我们证明基于语言模型的流程可以被临床医生用作智能助手，以支持图像安排工作流程，并根据美国放射学会的适用性标准提高急性图像安排的准确性。

结论

我们的工作展示并验证了一种策略，即利用基于人工智能的软件，根据专家循证指南改进可靠的临床决策。

https://cdn.ncbi.nlm.nih.gov/pmc/blobs/97b4/12322208/03eca54a1ba5/43856_2025_1061_Fig1_HTML.jpg

相似文献

Evaluating acute image ordering for real-world patient cases via language model alignment with radiological guidelines.通过与放射学指南的语言模型对齐来评估真实世界患者病例的急性影像检查单开具情况。

Commun Med (Lond). 2025 Aug 4;5(1):332. doi: 10.1038/s43856-025-01061-9.

Sexual Harassment and Prevention Training性骚扰与预防培训

Signs and symptoms to determine if a patient presenting in primary care or hospital outpatient settings has COVID-19.在基层医疗机构或医院门诊环境中，如果患者出现以下症状和体征，可判断其是否患有 COVID-19。

Cochrane Database Syst Rev. 2022 May 20;5(5):CD013665. doi: 10.1002/14651858.CD013665.pub3.

Multidisciplinary collaborative guidance on the assessment and treatment of patients with Long COVID: A compendium statement.关于长新冠患者评估与治疗的多学科协作指南：一份概要声明

PM R. 2025 Apr 22. doi: 10.1002/pmrj.13397.

Perceptions and experiences of the prevention, detection, and management of postpartum haemorrhage: a qualitative evidence synthesis.预防、检测和管理产后出血的认知和经验：定性证据综合。

Cochrane Database Syst Rev. 2023 Nov 27;11(11):CD013795. doi: 10.1002/14651858.CD013795.pub2.

Comparison of Two Modern Survival Prediction Tools, SORG-MLA and METSSS, in Patients With Symptomatic Long-bone Metastases Who Underwent Local Treatment With Surgery Followed by Radiotherapy and With Radiotherapy Alone.两种现代生存预测工具 SORG-MLA 和 METSSS 在接受手术联合放疗和单纯放疗治疗有症状长骨转移患者中的比较。

Clin Orthop Relat Res. 2024 Dec 1;482(12):2193-2208. doi: 10.1097/CORR.0000000000003185. Epub 2024 Jul 23.

Systemic Inflammatory Response Syndrome全身炎症反应综合征

Cost-effectiveness of using prognostic information to select women with breast cancer for adjuvant systemic therapy.利用预后信息为乳腺癌患者选择辅助性全身治疗的成本效益

Health Technol Assess. 2006 Sep;10(34):iii-iv, ix-xi, 1-204. doi: 10.3310/hta10340.

Automated monitoring compared to standard care for the early detection of sepsis in critically ill patients.与标准护理相比，自动监测用于危重症患者脓毒症的早期检测

Cochrane Database Syst Rev. 2018 Jun 25;6(6):CD012404. doi: 10.1002/14651858.CD012404.pub2.

Behavioral interventions to reduce risk for sexual transmission of HIV among men who have sex with men.降低男男性行为者中艾滋病毒性传播风险的行为干预措施。

Cochrane Database Syst Rev. 2008 Jul 16(3):CD001230. doi: 10.1002/14651858.CD001230.pub2.

本文引用的文献

Sociodemographic biases in medical decision making by large language models.大语言模型在医疗决策中的社会人口统计学偏差。

Nat Med. 2025 Apr 7. doi: 10.1038/s41591-025-03626-6.

Automation bias in AI-assisted detection of cerebral aneurysms on time-of-flight MR angiography.飞行时间磁共振血管造影术中人工智能辅助检测脑动脉瘤的自动化偏倚

Radiol Med. 2025 Apr;130(4):555-566. doi: 10.1007/s11547-025-01964-6. Epub 2025 Feb 12.

GPT-4 assistance for improvement of physician performance on patient care tasks: a randomized controlled trial.GPT-4辅助改善医生在患者护理任务中的表现：一项随机对照试验。

Nat Med. 2025 Apr;31(4):1233-1238. doi: 10.1038/s41591-024-03456-y. Epub 2025 Feb 5.

A generalist medical language model for disease diagnosis assistance.用于疾病诊断辅助的通用医学语言模型。

Nat Med. 2025 Mar;31(3):932-942. doi: 10.1038/s41591-024-03416-6. Epub 2025 Jan 8.

Care to Explain? AI Explanation Types Differentially Impact Chest Radiograph Diagnostic Performance and Physician Trust in AI.需要解释吗？人工智能解释类型对胸部 X 光诊断性能和医生对人工智能的信任有不同的影响。

Radiology. 2024 Nov;313(2):e233261. doi: 10.1148/radiol.233261.

Variation in batch ordering of imaging tests in the emergency department and the impact on care delivery.急诊科影像检查批次排序的差异及其对医疗服务的影响。

Health Serv Res. 2025 Feb;60(1):e14406. doi: 10.1111/1475-6773.14406. Epub 2024 Nov 5.

Large Language Model Influence on Diagnostic Reasoning: A Randomized Clinical Trial.大语言模型对诊断推理的影响：一项随机临床试验。

JAMA Netw Open. 2024 Oct 1;7(10):e2440969. doi: 10.1001/jamanetworkopen.2024.40969.

Evaluating the use of large language models to provide clinical recommendations in the Emergency Department.评估大型语言模型在急诊科提供临床建议的应用。

Nat Commun. 2024 Oct 8;15(1):8236. doi: 10.1038/s41467-024-52415-1.

Generalizability assessment of AI models across hospitals in a low-middle and high income country.在中低收入和高收入国家的医院之间评估人工智能模型的泛化能力。

Nat Commun. 2024 Sep 27;15(1):8270. doi: 10.1038/s41467-024-52618-6.

Toward a responsible future: recommendations for AI-enabled clinical decision support.迈向负责任的未来：人工智能支持的临床决策支持的建议。

J Am Med Inform Assoc. 2024 Nov 1;31(11):2730-2739. doi: 10.1093/jamia/ocae209.

文献AI研究员

20分钟写一篇综述，助力文献阅读效率提升50倍。

立即体验

用中文搜PubMed

大模型驱动的PubMed中文搜索引擎

马上搜索

文档翻译

学术文献翻译模型，支持多种主流文档格式。

立即体验

通过与放射学指南的语言模型对齐来评估真实世界患者病例的急性影像检查单开具情况。

Evaluating acute image ordering for real-world patient cases via language model alignment with radiological guidelines.

作者信息

机构信息

出版信息

BACKGROUND

METHODS

RESULTS

CONCLUSIONS

背景

方法

结果

结论

相似文献

本文引用的文献

文献AI研究员

用中文搜PubMed

文档翻译

Suppr 超能文献

相似文献

本文引用的文献