文献检索文档翻译深度研究
Suppr Zotero 插件Zotero 插件
邀请有礼套餐&价格历史记录

新学期,新优惠

限时优惠:9月1日-9月22日

30天高级会员仅需29元

1天体验卡首发特惠仅需5.99元

了解详情
不再提醒
插件&应用
Suppr Zotero 插件Zotero 插件浏览器插件Mac 客户端Windows 客户端微信小程序
高级版
套餐订阅购买积分包
AI 工具
文献检索文档翻译深度研究
关于我们
关于 Suppr公司介绍联系我们用户协议隐私条款
关注我们

Suppr 超能文献

核心技术专利:CN118964589B侵权必究
粤ICP备2023148730 号-1Suppr @ 2025

评估大语言模型在生成肺结节随访建议方面的能力。

Evaluation of large language models in generating pulmonary nodule follow-up recommendations.

作者信息

Wen Junzhe, Huang Wanyue, Yan Huzheng, Sun Jie, Dong Mengshi, Li Chao, Qin Jie

机构信息

Department of Radiology, The Third Affiliated Hospital of Sun Yat-sen University, Guangzhou, China.

Department of Interventional Radiology, The Third Affiliated Hospital of Sun Yat-sen University, Guangzhou, China.

出版信息

Eur J Radiol Open. 2025 Apr 30;14:100655. doi: 10.1016/j.ejro.2025.100655. eCollection 2025 Jun.


DOI:10.1016/j.ejro.2025.100655
PMID:40391069
原文链接:https://pmc.ncbi.nlm.nih.gov/articles/PMC12088779/
Abstract

RATIONALE AND OBJECTIVES: To evaluate the performance of large language models (LLMs) in generating clinically follow-up recommendations for pulmonary nodules by leveraging radiological report findings and management guidelines. MATERIALS AND METHODS: This retrospective study included CT follow-up reports of pulmonary nodules documented by senior radiologists from September 1st, 2023, to April 30th, 2024. Sixty reports were collected for prompting engineering additionally, based on few-shot learning and the Chain of Thought methodology. Radiological findings of pulmonary nodules, along with finally prompt, were input into GPT-4o-mini or ERNIE-4.0-Turbo-8K to generate follow-up recommendations. The AI-generated recommendations were evaluated against radiologist-defined guideline-based standards through binary classification, assessing nodule risk classifications, follow-up intervals, and harmfulness. Performance metrics included sensitivity, specificity, positive/negative predictive values, and F1 score. RESULTS: On 1009 reports from 996 patients (median age, 50.0 years, IQR, 39.0-60.0 years; 511 male patients), ERNIE-4.0-Turbo-8K and GPT-4o-mini demonstrated comparable performance in both accuracy of follow-up recommendations (94.6 % vs 92.8 %, P = 0.07) and harmfulness rates (2.9 % vs 3.5 %, P = 0.48). In nodules classification, ERNIE-4.0-Turbo-8K and GPT-4o-mini performed similarly with accuracy rates of 99.8 % vs 99.9 % sensitivity of 96.9 % vs 100.0 %, specificity of 99.9 % vs 99.9 %, positive predictive value of 96.9 % vs 96.9 %, negative predictive value of 100.0 % vs 99.9 %, f1-score of 96.9 % vs 98.4 %, respectively. CONCLUSION: LLMs show promise in providing guideline-based follow-up recommendations for pulmonary nodules, but require rigorous validation and supervision to mitigate potential clinical risks. This study offers insights into their potential role in automated radiological decision support.

摘要

原理与目的:通过利用放射学报告结果和管理指南,评估大语言模型(LLMs)在生成肺结节临床随访建议方面的性能。 材料与方法:这项回顾性研究纳入了2023年9月1日至2024年4月30日期间由资深放射科医生记录的肺结节CT随访报告。另外,基于少样本学习和思维链方法,收集了60份报告用于提示工程。将肺结节的放射学结果以及最终提示输入GPT-4o-mini或ERNIE-4.0-Turbo-8K以生成随访建议。通过二元分类,根据放射科医生定义的基于指南的标准对人工智能生成的建议进行评估,评估结节风险分类、随访间隔和危害性。性能指标包括敏感性、特异性、阳性/阴性预测值和F1分数。 结果:在来自996名患者的1009份报告中(中位年龄50.0岁,IQR为39.0 - 60.0岁;男性患者511名),ERNIE-4.0-Turbo-8K和GPT-4o-mini在随访建议准确性(94.6%对92.8%,P = 0.07)和危害性发生率(2.9%对3.5%,P = 0.48)方面表现出相似的性能。在结节分类中,ERNIE-4.0-Turbo-8K和GPT-4o-mini表现相似,准确率分别为99.8%对99.9%,敏感性为96.9%对100.0%,特异性为99.9%对99.9%,阳性预测值为96.9%对96.9%,阴性预测值为100.0%对99.9%,F1分数为96.9%对98.4%。 结论:大语言模型在为肺结节提供基于指南的随访建议方面显示出前景,但需要严格的验证和监督以减轻潜在的临床风险。本研究为其在自动放射学决策支持中的潜在作用提供了见解。

https://cdn.ncbi.nlm.nih.gov/pmc/blobs/d0a6/12088779/375a976cd3bc/gr5.jpg
https://cdn.ncbi.nlm.nih.gov/pmc/blobs/d0a6/12088779/af6e899b602f/gr1.jpg
https://cdn.ncbi.nlm.nih.gov/pmc/blobs/d0a6/12088779/340639e04587/gr2.jpg
https://cdn.ncbi.nlm.nih.gov/pmc/blobs/d0a6/12088779/64be5dfd1dcc/gr3.jpg
https://cdn.ncbi.nlm.nih.gov/pmc/blobs/d0a6/12088779/9a6140f6064d/gr4.jpg
https://cdn.ncbi.nlm.nih.gov/pmc/blobs/d0a6/12088779/375a976cd3bc/gr5.jpg
https://cdn.ncbi.nlm.nih.gov/pmc/blobs/d0a6/12088779/af6e899b602f/gr1.jpg
https://cdn.ncbi.nlm.nih.gov/pmc/blobs/d0a6/12088779/340639e04587/gr2.jpg
https://cdn.ncbi.nlm.nih.gov/pmc/blobs/d0a6/12088779/64be5dfd1dcc/gr3.jpg
https://cdn.ncbi.nlm.nih.gov/pmc/blobs/d0a6/12088779/9a6140f6064d/gr4.jpg
https://cdn.ncbi.nlm.nih.gov/pmc/blobs/d0a6/12088779/375a976cd3bc/gr5.jpg

相似文献

[1]
Evaluation of large language models in generating pulmonary nodule follow-up recommendations.

Eur J Radiol Open. 2025-4-30

[2]
Privacy-ensuring Open-weights Large Language Models Are Competitive with Closed-weights GPT-4o in Extracting Chest Radiography Findings from Free-Text Reports.

Radiology. 2025-1

[3]
Large language models for efficient whole-organ MRI score-based reports and categorization in knee osteoarthritis.

Insights Imaging. 2025-5-14

[4]
Extracting Pulmonary Embolism Diagnoses From Radiology Impressions Using GPT-4o: Large Language Model Evaluation Study.

JMIR Med Inform. 2025-4-9

[5]
Conversion of Mixed-Language Free-Text CT Reports of Pancreatic Cancer to National Comprehensive Cancer Network Structured Reporting Templates by Using GPT-4.

Korean J Radiol. 2025-6

[6]
Evaluation of Large Language Models in Tailoring Educational Content for Cancer Survivors and Their Caregivers: Quality Analysis.

JMIR Cancer. 2025-4-7

[7]
Lung Cancer Staging Using Chest CT and FDG PET/CT Free-Text Reports: Comparison Among Three ChatGPT Large Language Models and Six Human Readers of Varying Experience.

AJR Am J Roentgenol. 2024-12

[8]
Performance of GPT-4 Turbo and GPT-4o in Korean Society of Radiology In-Training Examinations.

Korean J Radiol. 2025-6

[9]
Evaluating the Role of GPT-4 and GPT-4o in the Detectability of Chest Radiography Reports Requiring Further Assessment.

Cureus. 2024-12-11

[10]
Evaluating GPT-4o's Performance in the Official European Board of Radiology Exam: A Comprehensive Assessment.

Acad Radiol. 2024-11

本文引用的文献

[1]
Application of large language models in disease diagnosis and treatment.

Chin Med J (Engl). 2025-1-20

[2]
Extraction of clinical data on major pulmonary diseases from unstructured radiologic reports using a large language model.

PLoS One. 2024

[3]
Assessing the performance of ChatGPT and Bard/Gemini against radiologists for Prostate Imaging-Reporting and Data System classification based on prostate multiparametric MRI text reports.

Br J Radiol. 2025-3-1

[4]
ChatGPT vs Gemini: Comparative Accuracy and Efficiency in CAD-RADS Score Assignment from Radiology Reports.

J Imaging Inform Med. 2024-11-11

[5]
Impact on Prognosis of Stage I Non-Small Cell Lung Cancer Secondary to Delays in Diagnostic Workup.

Radiology. 2024-10

[6]
Constructing a Large Language Model to Generate Impressions from Findings in Radiology Reports.

Radiology. 2024-9

[7]
Evaluating Large Language Models for Automated Reporting and Data Systems Categorization: Cross-Sectional Study.

JMIR Med Inform. 2024-7-17

[8]
Fine-Tuned Large Language Model for Extracting Patients on Pretreatment for Lung Cancer from a Picture Archiving and Communication System Based on Radiological Reports.

J Imaging Inform Med. 2025-2

[9]
Large Language Models for Automated Synoptic Reports and Resectability Categorization in Pancreatic Cancer.

Radiology. 2024-6

[10]
A critical assessment of using ChatGPT for extracting structured data from clinical notes.

NPJ Digit Med. 2024-5-1

文献AI研究员

20分钟写一篇综述,助力文献阅读效率提升50倍

立即体验

用中文搜PubMed

大模型驱动的PubMed中文搜索引擎

马上搜索

推荐工具

医学文档翻译智能文献检索