• 文献检索
  • 文档翻译
  • 深度研究
  • 学术资讯
  • Suppr Zotero 插件Zotero 插件
  • 邀请有礼
  • 套餐&价格
  • 历史记录
应用&插件
Suppr Zotero 插件Zotero 插件浏览器插件Mac 客户端Windows 客户端微信小程序
定价
高级版会员购买积分包购买API积分包
服务
文献检索文档翻译深度研究API 文档MCP 服务
关于我们
关于 Suppr公司介绍联系我们用户协议隐私条款
关注我们

Suppr 超能文献

核心技术专利:CN118964589B侵权必究
粤ICP备2023148730 号-1Suppr @ 2026

文献检索

告别复杂PubMed语法,用中文像聊天一样搜索,搜遍4000万医学文献。AI智能推荐,让科研检索更轻松。

立即免费搜索

文件翻译

保留排版,准确专业,支持PDF/Word/PPT等文件格式,支持 12+语言互译。

免费翻译文档

深度研究

AI帮你快速写综述,25分钟生成高质量综述,智能提取关键信息,辅助科研写作。

立即免费体验

评估大语言模型在生成肺结节随访建议方面的能力。

Evaluation of large language models in generating pulmonary nodule follow-up recommendations.

作者信息

Wen Junzhe, Huang Wanyue, Yan Huzheng, Sun Jie, Dong Mengshi, Li Chao, Qin Jie

机构信息

Department of Radiology, The Third Affiliated Hospital of Sun Yat-sen University, Guangzhou, China.

Department of Interventional Radiology, The Third Affiliated Hospital of Sun Yat-sen University, Guangzhou, China.

出版信息

Eur J Radiol Open. 2025 Apr 30;14:100655. doi: 10.1016/j.ejro.2025.100655. eCollection 2025 Jun.

DOI:10.1016/j.ejro.2025.100655
PMID:40391069
原文链接:https://pmc.ncbi.nlm.nih.gov/articles/PMC12088779/
Abstract

RATIONALE AND OBJECTIVES

To evaluate the performance of large language models (LLMs) in generating clinically follow-up recommendations for pulmonary nodules by leveraging radiological report findings and management guidelines.

MATERIALS AND METHODS

This retrospective study included CT follow-up reports of pulmonary nodules documented by senior radiologists from September 1st, 2023, to April 30th, 2024. Sixty reports were collected for prompting engineering additionally, based on few-shot learning and the Chain of Thought methodology. Radiological findings of pulmonary nodules, along with finally prompt, were input into GPT-4o-mini or ERNIE-4.0-Turbo-8K to generate follow-up recommendations. The AI-generated recommendations were evaluated against radiologist-defined guideline-based standards through binary classification, assessing nodule risk classifications, follow-up intervals, and harmfulness. Performance metrics included sensitivity, specificity, positive/negative predictive values, and F1 score.

RESULTS

On 1009 reports from 996 patients (median age, 50.0 years, IQR, 39.0-60.0 years; 511 male patients), ERNIE-4.0-Turbo-8K and GPT-4o-mini demonstrated comparable performance in both accuracy of follow-up recommendations (94.6 % vs 92.8 %, P = 0.07) and harmfulness rates (2.9 % vs 3.5 %, P = 0.48). In nodules classification, ERNIE-4.0-Turbo-8K and GPT-4o-mini performed similarly with accuracy rates of 99.8 % vs 99.9 % sensitivity of 96.9 % vs 100.0 %, specificity of 99.9 % vs 99.9 %, positive predictive value of 96.9 % vs 96.9 %, negative predictive value of 100.0 % vs 99.9 %, f1-score of 96.9 % vs 98.4 %, respectively.

CONCLUSION

LLMs show promise in providing guideline-based follow-up recommendations for pulmonary nodules, but require rigorous validation and supervision to mitigate potential clinical risks. This study offers insights into their potential role in automated radiological decision support.

摘要

原理与目的

通过利用放射学报告结果和管理指南,评估大语言模型(LLMs)在生成肺结节临床随访建议方面的性能。

材料与方法

这项回顾性研究纳入了2023年9月1日至2024年4月30日期间由资深放射科医生记录的肺结节CT随访报告。另外,基于少样本学习和思维链方法,收集了60份报告用于提示工程。将肺结节的放射学结果以及最终提示输入GPT-4o-mini或ERNIE-4.0-Turbo-8K以生成随访建议。通过二元分类,根据放射科医生定义的基于指南的标准对人工智能生成的建议进行评估,评估结节风险分类、随访间隔和危害性。性能指标包括敏感性、特异性、阳性/阴性预测值和F1分数。

结果

在来自996名患者的1009份报告中(中位年龄50.0岁,IQR为39.0 - 60.0岁;男性患者511名),ERNIE-4.0-Turbo-8K和GPT-4o-mini在随访建议准确性(94.6%对92.8%,P = 0.07)和危害性发生率(2.9%对3.5%,P = 0.48)方面表现出相似的性能。在结节分类中,ERNIE-4.0-Turbo-8K和GPT-4o-mini表现相似,准确率分别为99.8%对99.9%,敏感性为96.9%对100.0%,特异性为99.9%对99.9%,阳性预测值为96.9%对96.9%,阴性预测值为100.0%对99.9%,F1分数为96.9%对98.4%。

结论

大语言模型在为肺结节提供基于指南的随访建议方面显示出前景,但需要严格的验证和监督以减轻潜在的临床风险。本研究为其在自动放射学决策支持中的潜在作用提供了见解。

https://cdn.ncbi.nlm.nih.gov/pmc/blobs/d0a6/12088779/375a976cd3bc/gr5.jpg
https://cdn.ncbi.nlm.nih.gov/pmc/blobs/d0a6/12088779/af6e899b602f/gr1.jpg
https://cdn.ncbi.nlm.nih.gov/pmc/blobs/d0a6/12088779/340639e04587/gr2.jpg
https://cdn.ncbi.nlm.nih.gov/pmc/blobs/d0a6/12088779/64be5dfd1dcc/gr3.jpg
https://cdn.ncbi.nlm.nih.gov/pmc/blobs/d0a6/12088779/9a6140f6064d/gr4.jpg
https://cdn.ncbi.nlm.nih.gov/pmc/blobs/d0a6/12088779/375a976cd3bc/gr5.jpg
https://cdn.ncbi.nlm.nih.gov/pmc/blobs/d0a6/12088779/af6e899b602f/gr1.jpg
https://cdn.ncbi.nlm.nih.gov/pmc/blobs/d0a6/12088779/340639e04587/gr2.jpg
https://cdn.ncbi.nlm.nih.gov/pmc/blobs/d0a6/12088779/64be5dfd1dcc/gr3.jpg
https://cdn.ncbi.nlm.nih.gov/pmc/blobs/d0a6/12088779/9a6140f6064d/gr4.jpg
https://cdn.ncbi.nlm.nih.gov/pmc/blobs/d0a6/12088779/375a976cd3bc/gr5.jpg

相似文献

1
Evaluation of large language models in generating pulmonary nodule follow-up recommendations.评估大语言模型在生成肺结节随访建议方面的能力。
Eur J Radiol Open. 2025 Apr 30;14:100655. doi: 10.1016/j.ejro.2025.100655. eCollection 2025 Jun.
2
Privacy-ensuring Open-weights Large Language Models Are Competitive with Closed-weights GPT-4o in Extracting Chest Radiography Findings from Free-Text Reports.在从自由文本报告中提取胸部X光检查结果方面,确保隐私的开放权重大型语言模型与封闭权重的GPT-4o具有竞争力。
Radiology. 2025 Jan;314(1):e240895. doi: 10.1148/radiol.240895.
3
Large language models for efficient whole-organ MRI score-based reports and categorization in knee osteoarthritis.用于膝关节骨关节炎中基于MRI评分的高效全器官报告和分类的大语言模型
Insights Imaging. 2025 May 14;16(1):100. doi: 10.1186/s13244-025-01976-w.
4
Extracting Pulmonary Embolism Diagnoses From Radiology Impressions Using GPT-4o: Large Language Model Evaluation Study.使用GPT-4o从放射学诊断印象中提取肺栓塞诊断:大语言模型评估研究
JMIR Med Inform. 2025 Apr 9;13:e67706. doi: 10.2196/67706.
5
Conversion of Mixed-Language Free-Text CT Reports of Pancreatic Cancer to National Comprehensive Cancer Network Structured Reporting Templates by Using GPT-4.使用GPT-4将胰腺癌混合语言自由文本CT报告转换为美国国立综合癌症网络结构化报告模板
Korean J Radiol. 2025 Jun;26(6):557-568. doi: 10.3348/kjr.2024.1228. Epub 2025 Apr 17.
6
Evaluation of Large Language Models in Tailoring Educational Content for Cancer Survivors and Their Caregivers: Quality Analysis.大型语言模型在为癌症幸存者及其护理人员量身定制教育内容方面的评估:质量分析
JMIR Cancer. 2025 Apr 7;11:e67914. doi: 10.2196/67914.
7
Lung Cancer Staging Using Chest CT and FDG PET/CT Free-Text Reports: Comparison Among Three ChatGPT Large Language Models and Six Human Readers of Varying Experience.使用胸部CT和FDG PET/CT自由文本报告进行肺癌分期:三种ChatGPT大语言模型与六位不同经验水平的人类读者的比较
AJR Am J Roentgenol. 2024 Dec;223(6):e2431696. doi: 10.2214/AJR.24.31696. Epub 2024 Sep 4.
8
Performance of GPT-4 Turbo and GPT-4o in Korean Society of Radiology In-Training Examinations.GPT-4 Turbo和GPT-4o在韩国放射学会住院医师培训考试中的表现。
Korean J Radiol. 2025 Jun;26(6):524-531. doi: 10.3348/kjr.2024.1096. Epub 2025 Apr 17.
9
Evaluating the Role of GPT-4 and GPT-4o in the Detectability of Chest Radiography Reports Requiring Further Assessment.评估GPT-4和GPT-4o在需要进一步评估的胸部X光报告可检测性中的作用。
Cureus. 2024 Dec 11;16(12):e75532. doi: 10.7759/cureus.75532. eCollection 2024 Dec.
10
Evaluating GPT-4o's Performance in the Official European Board of Radiology Exam: A Comprehensive Assessment.评估 GPT-4o 在欧洲放射学委员会官方考试中的表现:全面评估。
Acad Radiol. 2024 Nov;31(11):4365-4371. doi: 10.1016/j.acra.2024.09.005. Epub 2024 Sep 18.

本文引用的文献

1
Application of large language models in disease diagnosis and treatment.大语言模型在疾病诊断与治疗中的应用。
Chin Med J (Engl). 2025 Jan 20;138(2):130-142. doi: 10.1097/CM9.0000000000003456. Epub 2024 Dec 26.
2
Extraction of clinical data on major pulmonary diseases from unstructured radiologic reports using a large language model.使用大型语言模型从非结构化放射报告中提取主要肺部疾病的临床数据。
PLoS One. 2024 Nov 25;19(11):e0314136. doi: 10.1371/journal.pone.0314136. eCollection 2024.
3
Assessing the performance of ChatGPT and Bard/Gemini against radiologists for Prostate Imaging-Reporting and Data System classification based on prostate multiparametric MRI text reports.
基于前列腺多参数MRI文本报告,评估ChatGPT和Bard/Gemini在前列腺影像报告和数据系统分类方面相对于放射科医生的性能。
Br J Radiol. 2025 Mar 1;98(1167):368-374. doi: 10.1093/bjr/tqae236.
4
ChatGPT vs Gemini: Comparative Accuracy and Efficiency in CAD-RADS Score Assignment from Radiology Reports.ChatGPT与Gemini:放射学报告中CAD-RADS评分分配的比较准确性和效率
J Imaging Inform Med. 2024 Nov 11. doi: 10.1007/s10278-024-01328-y.
5
Impact on Prognosis of Stage I Non-Small Cell Lung Cancer Secondary to Delays in Diagnostic Workup.Ⅰ期非小细胞肺癌诊断延迟对预后的影响。
Radiology. 2024 Oct;313(1):e240420. doi: 10.1148/radiol.240420.
6
Constructing a Large Language Model to Generate Impressions from Findings in Radiology Reports.构建一个大型语言模型,根据放射科报告中的发现生成印象。
Radiology. 2024 Sep;312(3):e240885. doi: 10.1148/radiol.240885.
7
Evaluating Large Language Models for Automated Reporting and Data Systems Categorization: Cross-Sectional Study.评估用于自动报告和数据系统分类的大语言模型:横断面研究。
JMIR Med Inform. 2024 Jul 17;12:e55799. doi: 10.2196/55799.
8
Fine-Tuned Large Language Model for Extracting Patients on Pretreatment for Lung Cancer from a Picture Archiving and Communication System Based on Radiological Reports.基于放射学报告从图像存档与通信系统中提取肺癌预处理患者的微调大语言模型
J Imaging Inform Med. 2025 Feb;38(1):327-334. doi: 10.1007/s10278-024-01186-8. Epub 2024 Jul 2.
9
Large Language Models for Automated Synoptic Reports and Resectability Categorization in Pancreatic Cancer.大语言模型在胰腺肿瘤自动化综述报告和可切除性分类中的应用。
Radiology. 2024 Jun;311(3):e233117. doi: 10.1148/radiol.233117.
10
A critical assessment of using ChatGPT for extracting structured data from clinical notes.对使用ChatGPT从临床记录中提取结构化数据的批判性评估。
NPJ Digit Med. 2024 May 1;7(1):106. doi: 10.1038/s41746-024-01079-8.