放射学报告的人工智能辅助摘要：评估GPT3davinci、BARTcnn、LongT5booksum、LEDbooksum、LEDlegal和LEDclinical。

AI-Assisted Summarization of Radiologic Reports: Evaluating GPT3davinci, BARTcnn, LongT5booksum, LEDbooksum, LEDlegal, and LEDclinical.

作者信息

Chien Aichi, Tang Hubert, Jagessar Bhavita, Chang Kai-Wei, Peng Nanyun, Nael Kambiz, Salamon Noriko

机构信息

From the Department of Radiological Science (A.C., H.T., B.J., K.N., N.S.), David Geffen School of Medicine at UCLA, Los Angeles, California

From the Department of Radiological Science (A.C., H.T., B.J., K.N., N.S.), David Geffen School of Medicine at UCLA, Los Angeles, California.

出版信息

AJNR Am J Neuroradiol. 2024 Feb 7;45(2):244-248. doi: 10.3174/ajnr.A8102.

DOI:10.3174/ajnr.A8102

PMID:38238092

原文链接:https://pmc.ncbi.nlm.nih.gov/articles/PMC11285993/

Abstract

BACKGROUND AND PURPOSE

The review of clinical reports is an essential part of monitoring disease progression. Synthesizing multiple imaging reports is also important for clinical decisions. It is critical to aggregate information quickly and accurately. Machine learning natural language processing (NLP) models hold promise to address an unmet need for report summarization.

MATERIALS AND METHODS

We evaluated NLP methods to summarize longitudinal aneurysm reports. A total of 137 clinical reports and 100 PubMed case reports were used in this study. Models were 1) compared against expert-generated summary using longitudinal imaging notes collected in our institute and 2) compared using publicly accessible PubMed case reports. Five AI models were used to summarize the clinical reports, and a sixth model, the online GPT3davinci NLP large language model (LLM), was added for the summarization of PubMed case reports. We assessed the summary quality through comparison with expert summaries using quantitative metrics and quality reviews by experts.

RESULTS

In clinical summarization, BARTcnn had the best performance (BERTscore = 0.8371), followed by LongT5Booksum and LEDlegal. In the analysis using PubMed case reports, GPT3davinci demonstrated the best performance, followed by models BARTcnn and then LEDbooksum (BERTscore = 0.894, 0.872, and 0.867, respectively).

CONCLUSIONS

AI NLP summarization models demonstrated great potential in summarizing longitudinal aneurysm reports, though none yet reached the level of quality for clinical usage. We found the online GPT LLM outperformed the others; however, the BARTcnn model is potentially more useful because it can be implemented on-site. Future work to improve summarization, address other types of neuroimaging reports, and develop structured reports may allow NLP models to ease clinical workflow.

摘要

背景与目的

临床报告审查是监测疾病进展的重要组成部分。综合多份影像报告对临床决策也很重要。快速准确地汇总信息至关重要。机器学习自然语言处理（NLP）模型有望满足报告总结这一未被满足的需求。

材料与方法

我们评估了用于总结动脉瘤纵向报告的NLP方法。本研究共使用了137份临床报告和100份PubMed病例报告。模型：1）与使用我们研究所收集的纵向影像记录由专家生成的总结进行比较；2）使用可公开获取的PubMed病例报告进行比较。使用五个人工智能模型来总结临床报告，并添加了第六个模型，即在线GPT3davinci NLP大语言模型（LLM）来总结PubMed病例报告。我们通过使用定量指标与专家总结进行比较以及专家进行质量评审来评估总结质量。

结果

在临床总结中，BARTcnn表现最佳（BERT分数 = 0.8371），其次是LongT5Booksum和LEDlegal。在使用PubMed病例报告进行的分析中，GPT3davinci表现最佳，其次是BARTcnn模型，然后是LEDbooksum（BERT分数分别为0.894、0.872和0.867）。

结论

人工智能NLP总结模型在总结动脉瘤纵向报告方面显示出巨大潜力，尽管尚无一个达到临床使用的质量水平。我们发现在线GPT大语言模型的表现优于其他模型；然而，BARTcnn模型可能更有用，因为它可以在现场实施。未来在改进总结、处理其他类型神经影像报告以及开发结构化报告方面的工作可能会使NLP模型简化临床工作流程。

相似文献

AI-Assisted Summarization of Radiologic Reports: Evaluating GPT3davinci, BARTcnn, LongT5booksum, LEDbooksum, LEDlegal, and LEDclinical.

AJNR Am J Neuroradiol. 2024 Feb 7;45(2):244-248. doi: 10.3174/ajnr.A8102.

Exploring the potential of ChatGPT in medical dialogue summarization: a study on consistency with human preferences.

BMC Med Inform Decis Mak. 2024 Mar 14;24(1):75. doi: 10.1186/s12911-024-02481-8.

Evaluation of large language models performance against humans for summarizing MRI knee radiology reports: A feasibility study.

Int J Med Inform. 2024 Jul;187:105443. doi: 10.1016/j.ijmedinf.2024.105443. Epub 2024 Apr 4.

A dataset and benchmark for hospital course summarization with adapted large language models.

J Am Med Inform Assoc. 2025 Mar 1;32(3):470-479. doi: 10.1093/jamia/ocae312.

Summarizing Online Patient Conversations Using Generative Language Models: Experimental and Comparative Study.

JMIR Med Inform. 2025 Apr 14;13:e62909. doi: 10.2196/62909.

Automated anonymization of radiology reports: comparison of publicly available natural language processing and large language models.

Eur Radiol. 2025 May;35(5):2634-2641. doi: 10.1007/s00330-024-11148-x. Epub 2024 Oct 31.

Impact of a Digital Scribe System on Clinical Documentation Time and Quality: Usability Study.

JMIR AI. 2024 Sep 23;3:e60020. doi: 10.2196/60020.

Leveraging GPT-4 for food effect summarization to enhance product-specific guidance development via iterative prompting.

J Biomed Inform. 2023 Dec;148:104533. doi: 10.1016/j.jbi.2023.104533. Epub 2023 Nov 2.

Adapted large language models can outperform medical experts in clinical text summarization.

Nat Med. 2024 Apr;30(4):1134-1142. doi: 10.1038/s41591-024-02855-5. Epub 2024 Feb 27.

Aligning large language models with radiologists by reinforcement learning from AI feedback for chest CT reports.

Eur J Radiol. 2025 Mar;184:111984. doi: 10.1016/j.ejrad.2025.111984. Epub 2025 Feb 6.

引用本文的文献

ArcTEX-a novel clinical data enrichment pipeline to support real-world evidence oncology studies.

Front Digit Health. 2025 May 9;7:1561358. doi: 10.3389/fdgth.2025.1561358. eCollection 2025.

Scientific Evidence for Clinical Text Summarization Using Large Language Models: Scoping Review.

J Med Internet Res. 2025 May 15;27:e68998. doi: 10.2196/68998.

Cross-Institutional Evaluation of Large Language Models for Radiology Diagnosis Extraction: A Prompt-Engineering Perspective.

J Imaging Inform Med. 2025 May 8. doi: 10.1007/s10278-025-01523-5.

Learning neuroimaging models from health system-scale data.

Res Sq. 2025 Feb 7:rs.3.rs-5932803. doi: 10.21203/rs.3.rs-5932803/v1.

Prospects for AI clinical summarization to reduce the burden of patient chart review.

Front Digit Health. 2024 Nov 7;6:1475092. doi: 10.3389/fdgth.2024.1475092. eCollection 2024.

Generative Large Language Models in Electronic Health Records for Patient Care Since 2023: A Systematic Review.

medRxiv. 2024 Aug 19:2024.08.11.24311828. doi: 10.1101/2024.08.11.24311828.

Real-World Adoption of Artificial Intelligence in Radiology: Opportunities and Barriers.

AJNR Am J Neuroradiol. 2024 Sep 9;45(9):1175-1176. doi: 10.3174/ajnr.A8422.

本文引用的文献

Large language models in radiology: fundamentals, applications, ethical considerations, risks, and future directions.

Diagn Interv Radiol. 2024 Mar 6;30(2):80-90. doi: 10.4274/dir.2023.232417. Epub 2023 Oct 3.

Radiology Reading Room for the Future: Harnessing the Power of Large Language Models Like ChatGPT.

Curr Probl Diagn Radiol. 2023 Aug 30. doi: 10.1067/j.cpradiol.2023.08.018.

BERTHop: An Effective Vision-and-Language Model for Chest X-ray Disease Diagnosis.

Med Image Comput Comput Assist Interv. 2022 Sep;13435:725-734. doi: 10.1007/978-3-031-16443-9_69. Epub 2022 Sep 16.

ChatGPT and Other Large Language Models Are Double-edged Swords.

Radiology. 2023 Apr;307(2):e230163. doi: 10.1148/radiol.230163. Epub 2023 Jan 26.

RadBERT: Adapting Transformer-based Language Models to Radiology.

Radiol Artif Intell. 2022 Jun 15;4(4):e210258. doi: 10.1148/ryai.210258. eCollection 2022 Jul.

Extraction of Temporal Information from Clinical Narratives.

J Healthc Inform Res. 2019 Feb 27;3(2):220-244. doi: 10.1007/s41666-019-00049-0. eCollection 2019 Jun.

Modern Clinical Text Mining: A Guide and Review.

Annu Rev Biomed Data Sci. 2021 Jul 20;4:165-187. doi: 10.1146/annurev-biodatasci-030421-030931. Epub 2021 May 26.

Extraction of temporal relations from clinical free text: A systematic review of current approaches.

J Biomed Inform. 2020 Aug;108:103488. doi: 10.1016/j.jbi.2020.103488. Epub 2020 Jul 13.

A validated natural language processing algorithm for brain imaging phenotypes from radiology reports in UK electronic health records.

BMC Med Inform Decis Mak. 2019 Sep 9;19(1):184. doi: 10.1186/s12911-019-0908-7.

Natural Language Processing for the Identification of Silent Brain Infarcts From Neuroimaging Reports.

JMIR Med Inform. 2019 Apr 21;7(2):e12109. doi: 10.2196/12109.

文献AI研究员

20分钟写一篇综述，助力文献阅读效率提升50倍。

立即体验

用中文搜PubMed

大模型驱动的PubMed中文搜索引擎

马上搜索

文档翻译

学术文献翻译模型，支持多种主流文档格式。

立即体验

放射学报告的人工智能辅助摘要：评估GPT3davinci、BARTcnn、LongT5booksum、LEDbooksum、LEDlegal和LEDclinical。

AI-Assisted Summarization of Radiologic Reports: Evaluating GPT3davinci, BARTcnn, LongT5booksum, LEDbooksum, LEDlegal, and LEDclinical.

作者信息

机构信息

出版信息

BACKGROUND AND PURPOSE

MATERIALS AND METHODS

RESULTS

CONCLUSIONS

背景与目的

材料与方法

结果

结论

相似文献

引用本文的文献

本文引用的文献

文献AI研究员

用中文搜PubMed

文档翻译

Suppr 超能文献