基于生成式预训练转换器的自动化放射学报告生成的初步评估：与放射科医生生成的报告进行比较。

Preliminary assessment of automated radiology report generation with generative pre-trained transformers: comparing results to radiologist-generated reports.

机构信息

Department of Diagnostic Radiology, Graduate School of Medical Sciences, Kumamoto University, 1-1-1 Honjo, Chuo-ku, Kumamoto-shi, Kumamoto, 860-8556, Japan.

Department of Medical Physics, Faculty of Life Sciences, Kumamoto University, Honjo 1-1-1, Kumamoto, 860-8556, Japan.

出版信息

Jpn J Radiol. 2024 Feb;42(2):190-200. doi: 10.1007/s11604-023-01487-y. Epub 2023 Sep 15.

DOI:10.1007/s11604-023-01487-y

PMID:37713022

原文链接:https://pmc.ncbi.nlm.nih.gov/articles/PMC10811038/

Abstract

PURPOSE

In this preliminary study, we aimed to evaluate the potential of the generative pre-trained transformer (GPT) series for generating radiology reports from concise imaging findings and compare its performance with radiologist-generated reports.

METHODS

This retrospective study involved 28 patients who underwent computed tomography (CT) scans and had a diagnosed disease with typical imaging findings. Radiology reports were generated using GPT-2, GPT-3.5, and GPT-4 based on the patient's age, gender, disease site, and imaging findings. We calculated the top-1, top-5 accuracy, and mean average precision (MAP) of differential diagnoses for GPT-2, GPT-3.5, GPT-4, and radiologists. Two board-certified radiologists evaluated the grammar and readability, image findings, impression, differential diagnosis, and overall quality of all reports using a 4-point scale.

RESULTS

Top-1 and Top-5 accuracies for the different diagnoses were highest for radiologists, followed by GPT-4, GPT-3.5, and GPT-2, in that order (Top-1: 1.00, 0.54, 0.54, and 0.21, respectively; Top-5: 1.00, 0.96, 0.89, and 0.54, respectively). There were no significant differences in qualitative scores about grammar and readability, image findings, and overall quality between radiologists and GPT-3.5 or GPT-4 (p > 0.05). However, qualitative scores of the GPT series in impression and differential diagnosis scores were significantly lower than those of radiologists (p < 0.05).

CONCLUSIONS

Our preliminary study suggests that GPT-3.5 and GPT-4 have the possibility to generate radiology reports with high readability and reasonable image findings from very short keywords; however, concerns persist regarding the accuracy of impressions and differential diagnoses, thereby requiring verification by radiologists.

摘要

目的

在这项初步研究中，我们旨在评估生成式预训练转换器（GPT）系列从简洁的影像学发现中生成放射学报告的潜力，并将其与放射科医生生成的报告进行比较。

方法

这项回顾性研究涉及 28 名接受计算机断层扫描（CT）检查且具有典型影像学表现的确诊疾病的患者。根据患者的年龄、性别、疾病部位和影像学表现，使用 GPT-2、GPT-3.5 和 GPT-4 生成放射学报告。我们计算了 GPT-2、GPT-3.5、GPT-4 和放射科医生对鉴别诊断的准确率、top-5 准确率和平均准确率（MAP）。两位具有董事会认证的放射科医生使用 4 分制评估了所有报告的语法和可读性、图像发现、印象、鉴别诊断和整体质量。

结果

不同诊断的 top-1 和 top-5 准确率最高的是放射科医生，其次是 GPT-4、GPT-3.5 和 GPT-2（top-1：1.00、0.54、0.54 和 0.21；top-5：1.00、0.96、0.89 和 0.54）。放射科医生与 GPT-3.5 或 GPT-4 之间在语法和可读性、图像发现和整体质量方面的定性评分没有显著差异（p>0.05）。然而，GPT 系列在印象和鉴别诊断评分方面的定性评分明显低于放射科医生（p<0.05）。

结论

我们的初步研究表明，GPT-3.5 和 GPT-4 有可能从非常短的关键字生成具有高可读性和合理图像发现的放射学报告；然而，对印象和鉴别诊断的准确性仍存在担忧，因此需要放射科医生进行验证。

https://cdn.ncbi.nlm.nih.gov/pmc/blobs/c694/10811038/3c148208be62/11604_2023_1487_Fig1_HTML.jpg

相似文献

Preliminary assessment of automated radiology report generation with generative pre-trained transformers: comparing results to radiologist-generated reports.

Jpn J Radiol. 2024 Feb;42(2):190-200. doi: 10.1007/s11604-023-01487-y. Epub 2023 Sep 15.

Comparing the Diagnostic Performance of GPT-4-based ChatGPT, GPT-4V-based ChatGPT, and Radiologists in Challenging Neuroradiology Cases.

Clin Neuroradiol. 2024 Dec;34(4):779-787. doi: 10.1007/s00062-024-01426-y. Epub 2024 May 28.

ChatGPT's diagnostic performance based on textual vs. visual information compared to radiologists' diagnostic performance in musculoskeletal radiology.

Eur Radiol. 2025 Jan;35(1):506-516. doi: 10.1007/s00330-024-10902-5. Epub 2024 Jul 12.

Evaluating the performance of Generative Pre-trained Transformer-4 (GPT-4) in standardizing radiology reports.

Eur Radiol. 2024 Jun;34(6):3566-3574. doi: 10.1007/s00330-023-10384-x. Epub 2023 Nov 8.

Generative pre-trained transformer (GPT)-4 support for differential diagnosis in neuroradiology.

Quant Imaging Med Surg. 2024 Oct 1;14(10):7551-7560. doi: 10.21037/qims-24-200. Epub 2024 Sep 23.

Generative Pre-trained Transformer 4 makes cardiovascular magnetic resonance reports easy to understand.

J Cardiovasc Magn Reson. 2024 Summer;26(1):101035. doi: 10.1016/j.jocmr.2024.101035. Epub 2024 Mar 7.

Comparative analysis of GPT-4-based ChatGPT's diagnostic performance with radiologists using real-world radiology reports of brain tumors.

Eur Radiol. 2025 Apr;35(4):1938-1947. doi: 10.1007/s00330-024-11032-8. Epub 2024 Aug 28.

Potential of GPT-4 for Detecting Errors in Radiology Reports: Implications for Reporting Accuracy.

Radiology. 2024 Apr;311(1):e232714. doi: 10.1148/radiol.232714.

Revolutionizing radiology with GPT-based models: Current applications, future possibilities and limitations of ChatGPT.

Diagn Interv Imaging. 2023 Jun;104(6):269-274. doi: 10.1016/j.diii.2023.02.003. Epub 2023 Feb 28.

Comparing Diagnostic Accuracy of Radiologists versus GPT-4V and Gemini Pro Vision Using Image Inputs from Diagnosis Please Cases.

Radiology. 2024 Jul;312(1):e240273. doi: 10.1148/radiol.240273.

引用本文的文献

Leveraging GPT-4o for Automated Extraction and Categorization of CAD-RADS Features From Free-Text Coronary CT Angiography Reports: Diagnostic Study.

JMIR Med Inform. 2025 Sep 10;13:e70967. doi: 10.2196/70967.

Assessing the ability of large language models to simplify lumbar spine imaging reports into patient-facing text: a pilot study of GPT-4.

Skeletal Radiol. 2025 Sep 9. doi: 10.1007/s00256-025-05027-9.

Illuminating radiogenomic signatures in pediatric-type diffuse gliomas: insights into molecular, clinical, and imaging correlations. Part I: high-grade group.

Radiol Med. 2025 Aug 25. doi: 10.1007/s11547-025-02078-9.

Intra-axial primary brain tumor differentiation: comparing large language models on structured MRI reports vs. radiologists on images.

Eur Radiol. 2025 Aug 22. doi: 10.1007/s00330-025-11924-3.

Large language models in perioperative medicine-applications and future prospects: a narrative review.

Can J Anaesth. 2025 Jun 9. doi: 10.1007/s12630-025-02980-w.

Large Language Models in Medical Diagnostics: Scoping Review With Bibliometric Analysis.

J Med Internet Res. 2025 Jun 9;27:e72062. doi: 10.2196/72062.

Diagnostic Performance of a Large Language Model for Determining the Cause of Death: A Comparative Analysis of Clinical History, Postmortem Computed Tomography Findings, and Their Integration.

Cureus. 2025 May 8;17(5):e83721. doi: 10.7759/cureus.83721. eCollection 2025 May.

Fine-tuned large Language model for extracting newly identified acute brain infarcts based on computed tomography or magnetic resonance imaging reports.

Emerg Radiol. 2025 Jun 2. doi: 10.1007/s10140-025-02354-1.

Comparing Diagnostic Accuracy of Clinical Professionals and Large Language Models: Systematic Review and Meta-Analysis.

JMIR Med Inform. 2025 Apr 25;13:e64963. doi: 10.2196/64963.

A systematic review and meta-analysis of diagnostic performance comparison between generative AI and physicians.

NPJ Digit Med. 2025 Mar 22;8(1):175. doi: 10.1038/s41746-025-01543-z.

本文引用的文献

Evaluating GPT4 on Impressions Generation in Radiology Reports.

Radiology. 2023 Jun;307(5):e231259. doi: 10.1148/radiol.231259.

Leveraging GPT-4 for Post Hoc Transformation of Free-text Radiology Reports into Structured Reporting: A Multilingual Feasibility Study.

Radiology. 2023 May;307(4):e230725. doi: 10.1148/radiol.230725. Epub 2023 Apr 4.

Performance of ChatGPT on USMLE: Potential for AI-assisted medical education using large language models.

PLOS Digit Health. 2023 Feb 9;2(2):e0000198. doi: 10.1371/journal.pdig.0000198. eCollection 2023 Feb.

Artificial intelligence in lung cancer: current applications and perspectives.

Jpn J Radiol. 2023 Mar;41(3):235-244. doi: 10.1007/s11604-022-01359-x. Epub 2022 Nov 9.

Trends and hot topics in radiology, nuclear medicine and medical imaging from 2011-2021: a bibliometric analysis of highly cited papers.

Jpn J Radiol. 2022 Aug;40(8):847-856. doi: 10.1007/s11604-022-01268-z. Epub 2022 Mar 28.

Attention based automated radiology report generation using CNN and LSTM.

PLoS One. 2022 Jan 6;17(1):e0262209. doi: 10.1371/journal.pone.0262209. eCollection 2022.

Impact of deep learning reconstruction on intracranial 1.5 T magnetic resonance angiography.

Jpn J Radiol. 2022 May;40(5):476-483. doi: 10.1007/s11604-021-01225-2. Epub 2021 Dec 1.

A novel strategy to develop deep learning for image super-resolution using original ultra-high-resolution computed tomography images of lung as training dataset.

Jpn J Radiol. 2022 Jan;40(1):38-47. doi: 10.1007/s11604-021-01184-8. Epub 2021 Jul 28.

Artificial intelligence: a critical review of current applications in pancreatic imaging.

Jpn J Radiol. 2021 Jun;39(6):514-523. doi: 10.1007/s11604-021-01098-5. Epub 2021 Feb 6.

How to Create a Great Radiology Report.

Radiographics. 2020 Oct;40(6):1658-1670. doi: 10.1148/rg.2020200020.

文献AI研究员

20分钟写一篇综述，助力文献阅读效率提升50倍。

立即体验

用中文搜PubMed

大模型驱动的PubMed中文搜索引擎

马上搜索

文档翻译

学术文献翻译模型，支持多种主流文档格式。

立即体验

基于生成式预训练转换器的自动化放射学报告生成的初步评估：与放射科医生生成的报告进行比较。

Preliminary assessment of automated radiology report generation with generative pre-trained transformers: comparing results to radiologist-generated reports.

机构信息

出版信息

PURPOSE

METHODS

RESULTS

CONCLUSIONS

目的

方法

结果

结论

相似文献

引用本文的文献

本文引用的文献

文献AI研究员

用中文搜PubMed

文档翻译

Suppr 超能文献

相似文献

引用本文的文献

本文引用的文献