• 文献检索
  • 文档翻译
  • 深度研究
  • 学术资讯
  • Suppr Zotero 插件Zotero 插件
  • 邀请有礼
  • 套餐&价格
  • 历史记录
应用&插件
Suppr Zotero 插件Zotero 插件浏览器插件Mac 客户端Windows 客户端微信小程序
定价
高级版会员购买积分包购买API积分包
服务
文献检索文档翻译深度研究API 文档MCP 服务
关于我们
关于 Suppr公司介绍联系我们用户协议隐私条款
关注我们

Suppr 超能文献

核心技术专利:CN118964589B侵权必究
粤ICP备2023148730 号-1Suppr @ 2026

文献检索

告别复杂PubMed语法,用中文像聊天一样搜索,搜遍4000万医学文献。AI智能推荐,让科研检索更轻松。

立即免费搜索

文件翻译

保留排版,准确专业,支持PDF/Word/PPT等文件格式,支持 12+语言互译。

免费翻译文档

深度研究

AI帮你快速写综述,25分钟生成高质量综述,智能提取关键信息,辅助科研写作。

立即免费体验

基于生成式预训练转换器的自动化放射学报告生成的初步评估:与放射科医生生成的报告进行比较。

Preliminary assessment of automated radiology report generation with generative pre-trained transformers: comparing results to radiologist-generated reports.

机构信息

Department of Diagnostic Radiology, Graduate School of Medical Sciences, Kumamoto University, 1-1-1 Honjo, Chuo-ku, Kumamoto-shi, Kumamoto, 860-8556, Japan.

Department of Medical Physics, Faculty of Life Sciences, Kumamoto University, Honjo 1-1-1, Kumamoto, 860-8556, Japan.

出版信息

Jpn J Radiol. 2024 Feb;42(2):190-200. doi: 10.1007/s11604-023-01487-y. Epub 2023 Sep 15.

DOI:10.1007/s11604-023-01487-y
PMID:37713022
原文链接:https://pmc.ncbi.nlm.nih.gov/articles/PMC10811038/
Abstract

PURPOSE

In this preliminary study, we aimed to evaluate the potential of the generative pre-trained transformer (GPT) series for generating radiology reports from concise imaging findings and compare its performance with radiologist-generated reports.

METHODS

This retrospective study involved 28 patients who underwent computed tomography (CT) scans and had a diagnosed disease with typical imaging findings. Radiology reports were generated using GPT-2, GPT-3.5, and GPT-4 based on the patient's age, gender, disease site, and imaging findings. We calculated the top-1, top-5 accuracy, and mean average precision (MAP) of differential diagnoses for GPT-2, GPT-3.5, GPT-4, and radiologists. Two board-certified radiologists evaluated the grammar and readability, image findings, impression, differential diagnosis, and overall quality of all reports using a 4-point scale.

RESULTS

Top-1 and Top-5 accuracies for the different diagnoses were highest for radiologists, followed by GPT-4, GPT-3.5, and GPT-2, in that order (Top-1: 1.00, 0.54, 0.54, and 0.21, respectively; Top-5: 1.00, 0.96, 0.89, and 0.54, respectively). There were no significant differences in qualitative scores about grammar and readability, image findings, and overall quality between radiologists and GPT-3.5 or GPT-4 (p > 0.05). However, qualitative scores of the GPT series in impression and differential diagnosis scores were significantly lower than those of radiologists (p < 0.05).

CONCLUSIONS

Our preliminary study suggests that GPT-3.5 and GPT-4 have the possibility to generate radiology reports with high readability and reasonable image findings from very short keywords; however, concerns persist regarding the accuracy of impressions and differential diagnoses, thereby requiring verification by radiologists.

摘要

目的

在这项初步研究中,我们旨在评估生成式预训练转换器(GPT)系列从简洁的影像学发现中生成放射学报告的潜力,并将其与放射科医生生成的报告进行比较。

方法

这项回顾性研究涉及 28 名接受计算机断层扫描(CT)检查且具有典型影像学表现的确诊疾病的患者。根据患者的年龄、性别、疾病部位和影像学表现,使用 GPT-2、GPT-3.5 和 GPT-4 生成放射学报告。我们计算了 GPT-2、GPT-3.5、GPT-4 和放射科医生对鉴别诊断的准确率、top-5 准确率和平均准确率(MAP)。两位具有董事会认证的放射科医生使用 4 分制评估了所有报告的语法和可读性、图像发现、印象、鉴别诊断和整体质量。

结果

不同诊断的 top-1 和 top-5 准确率最高的是放射科医生,其次是 GPT-4、GPT-3.5 和 GPT-2(top-1:1.00、0.54、0.54 和 0.21;top-5:1.00、0.96、0.89 和 0.54)。放射科医生与 GPT-3.5 或 GPT-4 之间在语法和可读性、图像发现和整体质量方面的定性评分没有显著差异(p>0.05)。然而,GPT 系列在印象和鉴别诊断评分方面的定性评分明显低于放射科医生(p<0.05)。

结论

我们的初步研究表明,GPT-3.5 和 GPT-4 有可能从非常短的关键字生成具有高可读性和合理图像发现的放射学报告;然而,对印象和鉴别诊断的准确性仍存在担忧,因此需要放射科医生进行验证。

https://cdn.ncbi.nlm.nih.gov/pmc/blobs/c694/10811038/740eab05f9f6/11604_2023_1487_Fig6_HTML.jpg
https://cdn.ncbi.nlm.nih.gov/pmc/blobs/c694/10811038/3c148208be62/11604_2023_1487_Fig1_HTML.jpg
https://cdn.ncbi.nlm.nih.gov/pmc/blobs/c694/10811038/e07fe642c8c3/11604_2023_1487_Fig2_HTML.jpg
https://cdn.ncbi.nlm.nih.gov/pmc/blobs/c694/10811038/7c75c4cb3587/11604_2023_1487_Fig3_HTML.jpg
https://cdn.ncbi.nlm.nih.gov/pmc/blobs/c694/10811038/eab2b37f2cca/11604_2023_1487_Fig4_HTML.jpg
https://cdn.ncbi.nlm.nih.gov/pmc/blobs/c694/10811038/65f2cb33b1a3/11604_2023_1487_Fig5_HTML.jpg
https://cdn.ncbi.nlm.nih.gov/pmc/blobs/c694/10811038/740eab05f9f6/11604_2023_1487_Fig6_HTML.jpg
https://cdn.ncbi.nlm.nih.gov/pmc/blobs/c694/10811038/3c148208be62/11604_2023_1487_Fig1_HTML.jpg
https://cdn.ncbi.nlm.nih.gov/pmc/blobs/c694/10811038/e07fe642c8c3/11604_2023_1487_Fig2_HTML.jpg
https://cdn.ncbi.nlm.nih.gov/pmc/blobs/c694/10811038/7c75c4cb3587/11604_2023_1487_Fig3_HTML.jpg
https://cdn.ncbi.nlm.nih.gov/pmc/blobs/c694/10811038/eab2b37f2cca/11604_2023_1487_Fig4_HTML.jpg
https://cdn.ncbi.nlm.nih.gov/pmc/blobs/c694/10811038/65f2cb33b1a3/11604_2023_1487_Fig5_HTML.jpg
https://cdn.ncbi.nlm.nih.gov/pmc/blobs/c694/10811038/740eab05f9f6/11604_2023_1487_Fig6_HTML.jpg

相似文献

1
Preliminary assessment of automated radiology report generation with generative pre-trained transformers: comparing results to radiologist-generated reports.基于生成式预训练转换器的自动化放射学报告生成的初步评估:与放射科医生生成的报告进行比较。
Jpn J Radiol. 2024 Feb;42(2):190-200. doi: 10.1007/s11604-023-01487-y. Epub 2023 Sep 15.
2
Comparing the Diagnostic Performance of GPT-4-based ChatGPT, GPT-4V-based ChatGPT, and Radiologists in Challenging Neuroradiology Cases.比较基于 GPT-4 的 ChatGPT、基于 GPT-4V 的 ChatGPT 和放射科医生在神经放射学挑战性病例中的诊断性能。
Clin Neuroradiol. 2024 Dec;34(4):779-787. doi: 10.1007/s00062-024-01426-y. Epub 2024 May 28.
3
ChatGPT's diagnostic performance based on textual vs. visual information compared to radiologists' diagnostic performance in musculoskeletal radiology.与放射科医生在肌肉骨骼放射学中的诊断表现相比,基于文本与视觉信息的ChatGPT的诊断表现。
Eur Radiol. 2025 Jan;35(1):506-516. doi: 10.1007/s00330-024-10902-5. Epub 2024 Jul 12.
4
Evaluating the performance of Generative Pre-trained Transformer-4 (GPT-4) in standardizing radiology reports.评估生成式预训练变换器4(GPT-4)在规范放射学报告方面的性能。
Eur Radiol. 2024 Jun;34(6):3566-3574. doi: 10.1007/s00330-023-10384-x. Epub 2023 Nov 8.
5
Generative pre-trained transformer (GPT)-4 support for differential diagnosis in neuroradiology.生成式预训练变换器(GPT)-4在神经放射学鉴别诊断中的应用
Quant Imaging Med Surg. 2024 Oct 1;14(10):7551-7560. doi: 10.21037/qims-24-200. Epub 2024 Sep 23.
6
Generative Pre-trained Transformer 4 makes cardiovascular magnetic resonance reports easy to understand.生成式预训练转换器 4 使得心血管磁共振报告易于理解。
J Cardiovasc Magn Reson. 2024 Summer;26(1):101035. doi: 10.1016/j.jocmr.2024.101035. Epub 2024 Mar 7.
7
Comparative analysis of GPT-4-based ChatGPT's diagnostic performance with radiologists using real-world radiology reports of brain tumors.基于GPT-4的ChatGPT与放射科医生在使用脑肿瘤真实世界放射学报告方面的诊断性能比较分析。
Eur Radiol. 2025 Apr;35(4):1938-1947. doi: 10.1007/s00330-024-11032-8. Epub 2024 Aug 28.
8
Potential of GPT-4 for Detecting Errors in Radiology Reports: Implications for Reporting Accuracy.GPT-4 在检测放射科报告错误方面的潜力:对报告准确性的影响。
Radiology. 2024 Apr;311(1):e232714. doi: 10.1148/radiol.232714.
9
Revolutionizing radiology with GPT-based models: Current applications, future possibilities and limitations of ChatGPT.基于 GPT 的模型推动放射学革命:ChatGPT 的当前应用、未来可能性和局限性。
Diagn Interv Imaging. 2023 Jun;104(6):269-274. doi: 10.1016/j.diii.2023.02.003. Epub 2023 Feb 28.
10
Comparing Diagnostic Accuracy of Radiologists versus GPT-4V and Gemini Pro Vision Using Image Inputs from Diagnosis Please Cases.比较放射科医生与 GPT-4V 和 Gemini Pro Vision 使用诊断请案例的图像输入的诊断准确性。
Radiology. 2024 Jul;312(1):e240273. doi: 10.1148/radiol.240273.

引用本文的文献

1
Leveraging GPT-4o for Automated Extraction and Categorization of CAD-RADS Features From Free-Text Coronary CT Angiography Reports: Diagnostic Study.利用GPT-4o从冠状动脉CT血管造影报告的自由文本中自动提取和分类CAD-RADS特征:诊断研究。
JMIR Med Inform. 2025 Sep 10;13:e70967. doi: 10.2196/70967.
2
Assessing the ability of large language models to simplify lumbar spine imaging reports into patient-facing text: a pilot study of GPT-4.评估大语言模型将腰椎影像报告简化为面向患者文本的能力:一项关于GPT-4的初步研究
Skeletal Radiol. 2025 Sep 9. doi: 10.1007/s00256-025-05027-9.
3
Illuminating radiogenomic signatures in pediatric-type diffuse gliomas: insights into molecular, clinical, and imaging correlations. Part I: high-grade group.

本文引用的文献

1
Evaluating GPT4 on Impressions Generation in Radiology Reports.评估GPT4在生成放射学报告印象方面的表现。
Radiology. 2023 Jun;307(5):e231259. doi: 10.1148/radiol.231259.
2
Leveraging GPT-4 for Post Hoc Transformation of Free-text Radiology Reports into Structured Reporting: A Multilingual Feasibility Study.利用GPT-4将自由文本放射学报告进行事后转换为结构化报告:一项多语言可行性研究。
Radiology. 2023 May;307(4):e230725. doi: 10.1148/radiol.230725. Epub 2023 Apr 4.
3
Performance of ChatGPT on USMLE: Potential for AI-assisted medical education using large language models.
小儿型弥漫性胶质瘤中具有启发性的放射基因组特征:分子、临床和影像相关性洞察。第一部分:高级别组。
Radiol Med. 2025 Aug 25. doi: 10.1007/s11547-025-02078-9.
4
Intra-axial primary brain tumor differentiation: comparing large language models on structured MRI reports vs. radiologists on images.轴内原发性脑肿瘤鉴别:比较基于结构化MRI报告的大语言模型与阅片放射科医生的表现
Eur Radiol. 2025 Aug 22. doi: 10.1007/s00330-025-11924-3.
5
Large language models in perioperative medicine-applications and future prospects: a narrative review.围手术期医学中的大语言模型——应用与未来前景:一篇叙述性综述
Can J Anaesth. 2025 Jun 9. doi: 10.1007/s12630-025-02980-w.
6
Large Language Models in Medical Diagnostics: Scoping Review With Bibliometric Analysis.医学诊断中的大语言模型:基于文献计量分析的综述
J Med Internet Res. 2025 Jun 9;27:e72062. doi: 10.2196/72062.
7
Diagnostic Performance of a Large Language Model for Determining the Cause of Death: A Comparative Analysis of Clinical History, Postmortem Computed Tomography Findings, and Their Integration.用于确定死因的大语言模型的诊断性能:临床病史、尸检计算机断层扫描结果及其整合的比较分析
Cureus. 2025 May 8;17(5):e83721. doi: 10.7759/cureus.83721. eCollection 2025 May.
8
Fine-tuned large Language model for extracting newly identified acute brain infarcts based on computed tomography or magnetic resonance imaging reports.基于计算机断层扫描或磁共振成像报告,用于提取新发现的急性脑梗死的微调大语言模型。
Emerg Radiol. 2025 Jun 2. doi: 10.1007/s10140-025-02354-1.
9
Comparing Diagnostic Accuracy of Clinical Professionals and Large Language Models: Systematic Review and Meta-Analysis.比较临床专业人员和大语言模型的诊断准确性:系统评价与荟萃分析
JMIR Med Inform. 2025 Apr 25;13:e64963. doi: 10.2196/64963.
10
A systematic review and meta-analysis of diagnostic performance comparison between generative AI and physicians.生成式人工智能与医生诊断性能比较的系统评价与荟萃分析
NPJ Digit Med. 2025 Mar 22;8(1):175. doi: 10.1038/s41746-025-01543-z.
ChatGPT在美国医师执照考试中的表现:使用大语言模型进行人工智能辅助医学教育的潜力。
PLOS Digit Health. 2023 Feb 9;2(2):e0000198. doi: 10.1371/journal.pdig.0000198. eCollection 2023 Feb.
4
Artificial intelligence in lung cancer: current applications and perspectives.人工智能在肺癌中的应用:现状与展望。
Jpn J Radiol. 2023 Mar;41(3):235-244. doi: 10.1007/s11604-022-01359-x. Epub 2022 Nov 9.
5
Trends and hot topics in radiology, nuclear medicine and medical imaging from 2011-2021: a bibliometric analysis of highly cited papers.2011-2021 年放射学、核医学和医学影像学的趋势和热点:高被引论文的文献计量分析。
Jpn J Radiol. 2022 Aug;40(8):847-856. doi: 10.1007/s11604-022-01268-z. Epub 2022 Mar 28.
6
Attention based automated radiology report generation using CNN and LSTM.基于注意力机制的 CNN 和 LSTM 在自动放射学报告生成中的应用
PLoS One. 2022 Jan 6;17(1):e0262209. doi: 10.1371/journal.pone.0262209. eCollection 2022.
7
Impact of deep learning reconstruction on intracranial 1.5 T magnetic resonance angiography.深度学习重建对颅内 1.5T 磁共振血管成像的影响。
Jpn J Radiol. 2022 May;40(5):476-483. doi: 10.1007/s11604-021-01225-2. Epub 2021 Dec 1.
8
A novel strategy to develop deep learning for image super-resolution using original ultra-high-resolution computed tomography images of lung as training dataset.一种使用原始超高分辨率 CT 肺部图像作为训练数据集开发图像超分辨率深度学习的新策略。
Jpn J Radiol. 2022 Jan;40(1):38-47. doi: 10.1007/s11604-021-01184-8. Epub 2021 Jul 28.
9
Artificial intelligence: a critical review of current applications in pancreatic imaging.人工智能:当前在胰腺成像中应用的批判性综述。
Jpn J Radiol. 2021 Jun;39(6):514-523. doi: 10.1007/s11604-021-01098-5. Epub 2021 Feb 6.
10
How to Create a Great Radiology Report.如何撰写优质的放射科报告
Radiographics. 2020 Oct;40(6):1658-1670. doi: 10.1148/rg.2020200020.