• 文献检索
  • 文档翻译
  • 深度研究
  • 学术资讯
  • Suppr Zotero 插件Zotero 插件
  • 邀请有礼
  • 套餐&价格
  • 历史记录
应用&插件
Suppr Zotero 插件Zotero 插件浏览器插件Mac 客户端Windows 客户端微信小程序
定价
高级版会员购买积分包购买API积分包
服务
文献检索文档翻译深度研究API 文档MCP 服务
关于我们
关于 Suppr公司介绍联系我们用户协议隐私条款
关注我们

Suppr 超能文献

核心技术专利:CN118964589B侵权必究
粤ICP备2023148730 号-1Suppr @ 2026

文献检索

告别复杂PubMed语法,用中文像聊天一样搜索,搜遍4000万医学文献。AI智能推荐,让科研检索更轻松。

立即免费搜索

文件翻译

保留排版,准确专业,支持PDF/Word/PPT等文件格式,支持 12+语言互译。

免费翻译文档

深度研究

AI帮你快速写综述,25分钟生成高质量综述,智能提取关键信息,辅助科研写作。

立即免费体验

用于从半结构化冠状动脉CT血管造影报告中提取CAD-RADS 2.0的大语言模型:一项多机构研究

Large Language Models for CAD-RADS 2.0 Extraction From Semi-Structured Coronary CT Angiography Reports: A Multi-Institutional Study.

作者信息

Min Dabin, Jin Kwang Nam, Bang SangHeum, Kim Moon Young, Kim Hack-Lyoung, Jeong Won Gi, Lee Hye-Jeong, Beck Kyongmin Sarah, Hwang Sung Ho, Kim Eun Young, Park Chang Min

机构信息

Interdisciplinary Program in Bioengineering, Seoul National University Graduate School, Seoul, Republic of Korea.

Integrated Major in Innovative Medical Science, Seoul National University Graduate School, Seoul, Republic of Korea.

出版信息

Korean J Radiol. 2025 Sep;26(9):817-831. doi: 10.3348/kjr.2025.0293.

DOI:10.3348/kjr.2025.0293
PMID:40873373
原文链接:https://pmc.ncbi.nlm.nih.gov/articles/PMC12394816/
Abstract

OBJECTIVE

To evaluate the accuracy of large language models (LLMs) in extracting Coronary Artery Disease-Reporting and Data System (CAD-RADS) 2.0 components from coronary CT angiography (CCTA) reports, and assess the impact of prompting strategies.

MATERIALS AND METHODS

In this multi-institutional study, we collected 319 synthetic, semi-structured CCTA reports from six institutions to protect patient privacy while maintaining clinical relevance. The dataset included 150 reports from a primary institution (100 for instruction development and 50 for internal testing) and 169 reports from five external institutions for external testing. Board-certified radiologists established reference standards following the CAD-RADS 2.0 guidelines for all three components: stenosis severity, plaque burden, and modifiers. Six LLMs (GPT-4, GPT-4o, Claude-3.5-Sonnet, o1-mini, Gemini-1.5-Pro, and DeepSeek-R1-Distill-Qwen-14B) were evaluated using an optimized instruction with prompting strategies, including zero-shot or few-shot with or without chain-of-thought (CoT) prompting. The accuracy was assessed and compared using McNemar's test.

RESULTS

LLMs demonstrated robust accuracy across all CAD-RADS 2.0 components. Peak stenosis severity accuracies reached 0.980 (48/49, Claude-3.5-Sonnet and o1-mini) in internal testing and 0.946 (158/167, GPT-4o and o1-mini) in external testing. Plaque burden extraction showed exceptional accuracy, with multiple models achieving perfect accuracy (43/43) in internal testing and 0.993 (137/138, GPT-4o, and o1-mini) in external testing. Modifier detection demonstrated consistently high accuracy (≥0.990) across most models. One open-source model, DeepSeek-R1-Distill-Qwen-14B, showed a relatively low accuracy for stenosis severity: 0.898 (44/49, internal) and 0.820 (137/167, external). CoT prompting significantly enhanced the accuracy of several models, with GPT-4 showing the most substantial improvements: stenosis severity accuracy increased by 0.192 ( < 0.001) and plaque burden accuracy by 0.152 ( < 0.001) in external testing.

CONCLUSION

LLMs demonstrated high accuracy in automated extraction of CAD-RADS 2.0 components from semi-structured CCTA reports, particularly when used with CoT prompting.

摘要

目的

评估大语言模型(LLMs)从冠状动脉CT血管造影(CCTA)报告中提取冠状动脉疾病报告和数据系统(CAD-RADS)2.0组件的准确性,并评估提示策略的影响。

材料与方法

在这项多机构研究中,我们从六个机构收集了319份合成的、半结构化的CCTA报告,以保护患者隐私同时保持临床相关性。数据集包括来自一个主要机构的150份报告(100份用于指令开发,50份用于内部测试)和来自五个外部机构的169份报告用于外部测试。经过委员会认证的放射科医生根据CAD-RADS 2.0指南为所有三个组件(狭窄严重程度、斑块负荷和修饰符)建立了参考标准。使用带有提示策略的优化指令对六个大语言模型(GPT-4、GPT-4o、Claude-3.5-Sonnet、o1-mini、Gemini-1.5-Pro和DeepSeek-R1-Distill-Qwen-14B)进行评估,提示策略包括零样本或少样本,有无思维链(CoT)提示。使用McNemar检验评估并比较准确性。

结果

大语言模型在所有CAD-RADS 2.0组件上均表现出强大的准确性。内部测试中,峰值狭窄严重程度的准确率达到0.980(48/49,Claude-3.5-Sonnet和o1-mini),外部测试中达到0.946(158/167,GPT-4o和o1-mini)。斑块负荷提取显示出极高的准确性,多个模型在内部测试中达到完美准确率(43/43),外部测试中达到0.993(137/138,GPT-4o和o1-mini)。修饰符检测在大多数模型中表现出始终如一的高准确率(≥0.990)。一个开源模型DeepSeek-R1-Distill-Qwen-14B在狭窄严重程度方面显示出相对较低的准确率:内部为0.898(44/49),外部为0.820(137/167)。思维链提示显著提高了几个模型的准确性,GPT-4表现出最大的改进:外部测试中狭窄严重程度准确率提高了0.192(<0.001),斑块负荷准确率提高了0.152(<0.001)。

结论

大语言模型在从半结构化CCTA报告中自动提取CAD-RADS 2.0组件方面表现出高准确性,特别是在与思维链提示一起使用时。

https://cdn.ncbi.nlm.nih.gov/pmc/blobs/0393/12394816/5c7e4aa5553c/kjr-26-817-g003.jpg
https://cdn.ncbi.nlm.nih.gov/pmc/blobs/0393/12394816/914b5fe2e6a9/kjr-26-817-g001.jpg
https://cdn.ncbi.nlm.nih.gov/pmc/blobs/0393/12394816/fb579f1c8074/kjr-26-817-g002.jpg
https://cdn.ncbi.nlm.nih.gov/pmc/blobs/0393/12394816/5c7e4aa5553c/kjr-26-817-g003.jpg
https://cdn.ncbi.nlm.nih.gov/pmc/blobs/0393/12394816/914b5fe2e6a9/kjr-26-817-g001.jpg
https://cdn.ncbi.nlm.nih.gov/pmc/blobs/0393/12394816/fb579f1c8074/kjr-26-817-g002.jpg
https://cdn.ncbi.nlm.nih.gov/pmc/blobs/0393/12394816/5c7e4aa5553c/kjr-26-817-g003.jpg

相似文献

1
Large Language Models for CAD-RADS 2.0 Extraction From Semi-Structured Coronary CT Angiography Reports: A Multi-Institutional Study.用于从半结构化冠状动脉CT血管造影报告中提取CAD-RADS 2.0的大语言模型:一项多机构研究
Korean J Radiol. 2025 Sep;26(9):817-831. doi: 10.3348/kjr.2025.0293.
2
Comparative performance of large language models in structuring head CT radiology reports: multi-institutional validation study in Japan.大型语言模型在构建头部CT放射学报告中的比较性能:日本的多机构验证研究
Jpn J Radiol. 2025 May 14. doi: 10.1007/s11604-025-01799-1.
3
Data extraction from free-text stroke CT reports using GPT-4o and Llama-3.3-70B: the impact of annotation guidelines.使用GPT-4o和Llama-3.3-70B从自由文本中风CT报告中提取数据:注释指南的影响
Eur Radiol Exp. 2025 Jun 19;9(1):61. doi: 10.1186/s41747-025-00600-2.
4
Performance analysis of large language models in multi-disease detection from chest computed tomography reports: a comparative study: Experimental Research.基于胸部计算机断层扫描报告的多疾病检测中大型语言模型的性能分析:一项比较研究:实验研究
Int J Surg. 2025 Jun 5. doi: 10.1097/JS9.0000000000002582.
5
Using a Diverse Test Suite to Assess Large Language Models on Fast Health Care Interoperability Resources Knowledge: Comparative Analysis.使用多样化测试套件在快速医疗保健互操作性资源知识方面评估大语言模型:比较分析
J Med Internet Res. 2025 Aug 12;27:e73540. doi: 10.2196/73540.
6
Classifying Patient Complaints Using Artificial Intelligence-Powered Large Language Models: Cross-Sectional Study.使用人工智能驱动的大语言模型对患者投诉进行分类:横断面研究
J Med Internet Res. 2025 Aug 6;27:e74231. doi: 10.2196/74231.
7
ChatGPT vs Gemini: Comparative Accuracy and Efficiency in CAD-RADS Score Assignment from Radiology Reports.ChatGPT与Gemini:放射学报告中CAD-RADS评分分配的比较准确性和效率
J Imaging Inform Med. 2024 Nov 11. doi: 10.1007/s10278-024-01328-y.
8
Enhancing Pulmonary Disease Prediction Using Large Language Models With Feature Summarization and Hybrid Retrieval-Augmented Generation: Multicenter Methodological Study Based on Radiology Report.使用具有特征总结和混合检索增强生成功能的大语言模型增强肺部疾病预测:基于放射学报告的多中心方法学研究
J Med Internet Res. 2025 Jun 11;27:e72638. doi: 10.2196/72638.
9
High-resolution deep learning reconstruction for coronary CTA: compared efficacy of stenosis evaluation with other methods at in vitro and in vivo studies.冠状动脉CTA的高分辨率深度学习重建:在体外和体内研究中与其他方法比较狭窄评估的疗效
Eur Radiol. 2025 Feb 4. doi: 10.1007/s00330-025-11376-9.
10
Comparison of a Specialized Large Language Model with GPT-4o for CT and MRI Radiology Report Summarization.一种用于CT和MRI放射学报告总结的专业大语言模型与GPT-4o的比较。
Radiology. 2025 Aug;316(2):e243774. doi: 10.1148/radiol.243774.

本文引用的文献

1
Diagnostic Accuracy of On-Premise Automated Coronary CT Angiography Analysis Based on Coronary Artery Disease Reporting and Data System 2.0.基于冠状动脉疾病报告和数据系统2.0的现场自动冠状动脉CT血管造影分析的诊断准确性
Radiology. 2025 May;315(2):e242087. doi: 10.1148/radiol.242087.
2
ChatGPT vs Gemini: Comparative Accuracy and Efficiency in CAD-RADS Score Assignment from Radiology Reports.ChatGPT与Gemini:放射学报告中CAD-RADS评分分配的比较准确性和效率
J Imaging Inform Med. 2024 Nov 11. doi: 10.1007/s10278-024-01328-y.
3
Minimum Reporting Items for Clear Evaluation of Accuracy Reports of Large Language Models in Healthcare (MI-CLEAR-LLM).
用于清晰评估医疗保健领域大语言模型准确性报告的最低报告项目(MI-CLEAR-LLM)。
Korean J Radiol. 2024 Oct;25(10):865-868. doi: 10.3348/kjr.2024.0843.
4
Privacy-preserving large language models for structured medical information retrieval.用于结构化医学信息检索的隐私保护大语言模型
NPJ Digit Med. 2024 Sep 20;7(1):257. doi: 10.1038/s41746-024-01233-2.
5
Can large language models be new supportive tools in coronary computed tomography angiography reporting?大语言模型能否成为冠状动脉 CT 血管造影报告的新辅助工具?
Clin Imaging. 2024 Oct;114:110271. doi: 10.1016/j.clinimag.2024.110271. Epub 2024 Aug 31.
6
Can large language models reason about medical questions?大型语言模型能对医学问题进行推理吗?
Patterns (N Y). 2024 Mar 1;5(3):100943. doi: 10.1016/j.patter.2024.100943. eCollection 2024 Mar 8.
7
Assessing the role of GPT-4 in thyroid ultrasound diagnosis and treatment recommendations: enhancing interpretability with a chain of thought approach.评估GPT-4在甲状腺超声诊断及治疗建议中的作用:采用思维链方法提高可解释性
Quant Imaging Med Surg. 2024 Feb 1;14(2):1602-1615. doi: 10.21037/qims-23-1180. Epub 2024 Jan 11.
8
Chain of Thought Utilization in Large Language Models and Application in Nephrology.大语言模型中的思维链利用及其在肾脏病学中的应用。
Medicina (Kaunas). 2024 Jan 13;60(1):148. doi: 10.3390/medicina60010148.
9
Performance of ChatGPT incorporated chain-of-thought method in bilingual nuclear medicine physician board examinations.结合思维链方法的ChatGPT在双语核医学医师资格考试中的表现
Digit Health. 2024 Jan 5;10:20552076231224074. doi: 10.1177/20552076231224074. eCollection 2024 Jan-Dec.
10
ACR Lung-RADS v2022: Assessment Categories and Management Recommendations.美国放射学会(ACR)2022版肺部影像报告和数据系统(Lung-RADS):评估类别与管理建议
J Am Coll Radiol. 2024 Mar;21(3):473-488. doi: 10.1016/j.jacr.2023.09.009. Epub 2023 Oct 10.