基于生成式预训练转换器的医学影像字幕生成。

Medical image captioning via generative pretrained transformers.

机构信息

Skolkovo Institute of Science and Technology, Bolshoy blvd., 30/1, Moscow, 121205, Russia.

Philips (Russia), Skolkovo Technopark 42, Building 1, Bolshoi Boulevard, Moscow, 121205, Russia.

出版信息

Sci Rep. 2023 Mar 13;13(1):4171. doi: 10.1038/s41598-023-31223-5.

DOI:10.1038/s41598-023-31223-5

PMID:36914733

原文链接:https://pmc.ncbi.nlm.nih.gov/articles/PMC10010644/

Abstract

The proposed model for automatic clinical image caption generation combines the analysis of radiological scans with structured patient information from the textual records. It uses two language models, the Show-Attend-Tell and the GPT-3, to generate comprehensive and descriptive radiology records. The generated textual summary contains essential information about pathologies found, their location, along with the 2D heatmaps that localize each pathology on the scans. The model has been tested on two medical datasets, the Open-I, MIMIC-CXR, and the general-purpose MS-COCO, and the results measured with natural language assessment metrics demonstrated its efficient applicability to chest X-ray image captioning.

摘要

该自动临床图像字幕生成模型结合了放射学扫描的分析和来自文本记录的结构化患者信息。它使用两种语言模型，即 Show-Attend-Tell 和 GPT-3，生成全面而描述性的放射学记录。生成的文本摘要包含有关发现的病理学、其位置的基本信息，以及在扫描上定位每个病理学的 2D 热图。该模型已经在两个医学数据集，即 Open-I、MIMIC-CXR 和通用的 MS-COCO 上进行了测试，并且使用自然语言评估指标衡量的结果表明它可以有效地应用于胸部 X 射线图像字幕生成。

https://cdn.ncbi.nlm.nih.gov/pmc/blobs/b65c/10011528/2a2975fc617f/41598_2023_31223_Fig1_HTML.jpg

相似文献

Medical image captioning via generative pretrained transformers.基于生成式预训练转换器的医学影像字幕生成。

Sci Rep. 2023 Mar 13;13(1):4171. doi: 10.1038/s41598-023-31223-5.

Translating medical image to radiological report: Adaptive multilevel multi-attention approach.将医学图像翻译为放射报告：自适应多级多关注方法。

Comput Methods Programs Biomed. 2022 Jun;221:106853. doi: 10.1016/j.cmpb.2022.106853. Epub 2022 May 4.

XRaySwinGen: Automatic medical reporting for X-ray exams with multimodal model.XRaySwinGen：使用多模态模型进行X光检查的自动医学报告生成

Heliyon. 2024 Mar 12;10(7):e27516. doi: 10.1016/j.heliyon.2024.e27516. eCollection 2024 Apr 15.

Evaluation of GPT-4's Chest X-Ray Impression Generation: A Reader Study on Performance and Perception.评估 GPT-4 生成的胸部 X 光印象：一项关于性能和感知的读者研究。

J Med Internet Res. 2023 Dec 22;25:e50865. doi: 10.2196/50865.

Medical Image Captioning Using Optimized Deep Learning Model.基于优化深度学习模型的医学影像字幕生成。

Comput Intell Neurosci. 2022 Mar 9;2022:9638438. doi: 10.1155/2022/9638438. eCollection 2022.

From Show to Tell: A Survey on Deep Learning-Based Image Captioning.从展示到讲述：基于深度学习的图像字幕研究综述

IEEE Trans Pattern Anal Mach Intell. 2023 Jan;45(1):539-559. doi: 10.1109/TPAMI.2022.3148210. Epub 2022 Dec 5.

Arabic Captioning for Images of Clothing Using Deep Learning.基于深度学习的服装图像阿拉伯语字幕生成。

Sensors (Basel). 2023 Apr 7;23(8):3783. doi: 10.3390/s23083783.

Radiology report generation with a learned knowledge base and multi-modal alignment.基于学习知识库和多模态对齐的放射学报告生成

Med Image Anal. 2023 May;86:102798. doi: 10.1016/j.media.2023.102798. Epub 2023 Mar 23.

Effective Pre-Training Method and Its Compositional Intelligence for Image Captioning.用于图像字幕生成的有效预训练方法及其组合智能。

Sensors (Basel). 2022 Apr 30;22(9):3433. doi: 10.3390/s22093433.

Knowledge matters: Chest radiology report generation with general and specific knowledge.知识很重要：使用通用和特定知识生成胸部放射学报告。

Med Image Anal. 2022 Aug;80:102510. doi: 10.1016/j.media.2022.102510. Epub 2022 Jun 9.

引用本文的文献

Generative Artificial Intelligence in Prostate Cancer Imaging.前列腺癌成像中的生成式人工智能

Balkan Med J. 2025 Jul 1;42(4):286-300. doi: 10.4274/balkanmedj.galenos.2025.2025-4-69.

NeuroLens: organ localization using natural language commands for anatomical recognition in surgical training.神经透镜：在外科手术训练中使用自然语言命令进行解剖识别的器官定位

Int J Comput Assist Radiol Surg. 2025 Jun 24. doi: 10.1007/s11548-025-03463-5.

Large Language Models in Medical Diagnostics: Scoping Review With Bibliometric Analysis.医学诊断中的大语言模型：基于文献计量分析的综述

J Med Internet Res. 2025 Jun 9;27:e72062. doi: 10.2196/72062.

Intelligent health model for medical imaging to guide laymen using neural cellular automata.用于医学成像的智能健康模型，以利用神经细胞自动机指导外行人。

Sci Rep. 2025 May 20;15(1):17429. doi: 10.1038/s41598-025-94032-y.

Towards a holistic framework for multimodal LLM in 3D brain CT radiology report generation.迈向用于3D脑CT放射学报告生成的多模态大语言模型的整体框架。

Nat Commun. 2025 Mar 6;16(1):2258. doi: 10.1038/s41467-025-57426-0.

Transforming Healthcare: Artificial Intelligence (AI) Applications in Medical Imaging and Drug Response Prediction.变革医疗保健：人工智能在医学成像和药物反应预测中的应用

Genome Integr. 2025 Jan 22;15:e20240002. doi: 10.14293/genint.15.1.002. eCollection 2024.

Large Language Models in Biomedical and Health Informatics: A Review with Bibliometric Analysis.生物医学与健康信息学中的大语言模型：文献计量分析综述

J Healthc Inform Res. 2024 Sep 14;8(4):658-711. doi: 10.1007/s41666-024-00171-8. eCollection 2024 Dec.

A dental intraoral image dataset of gingivitis for image captioning.用于图像字幕的牙龈炎口腔内牙齿图像数据集。

Data Brief. 2024 Sep 19;57:110960. doi: 10.1016/j.dib.2024.110960. eCollection 2024 Dec.

Zero-shot Learning with Minimum Instruction to Extract Social Determinants and Family History from Clinical Notes using GPT Model.使用GPT模型从临床记录中提取社会决定因素和家族病史的最少指令零样本学习

Proc IEEE Int Conf Big Data. 2023 Dec;2023:1476-1480. doi: 10.1109/BigData59044.2023.10386811.

Zero-shot learning to extract assessment criteria and medical services from the preventive healthcare guidelines using large language models.基于大语言模型的零样本学习从预防保健指南中提取评估标准和医疗服务。

J Am Med Inform Assoc. 2024 Aug 1;31(8):1743-1753. doi: 10.1093/jamia/ocae145.

本文引用的文献

Performance of ChatGPT on USMLE: Potential for AI-assisted medical education using large language models.ChatGPT在美国医师执照考试中的表现：使用大语言模型进行人工智能辅助医学教育的潜力。

PLOS Digit Health. 2023 Feb 9;2(2):e0000198. doi: 10.1371/journal.pdig.0000198. eCollection 2023 Feb.

Deep negative volume segmentation.深度负体积分割。

Sci Rep. 2021 Aug 11;11(1):16292. doi: 10.1038/s41598-021-95526-1.

Multi-domain clinical natural language processing with MedCAT: The Medical Concept Annotation Toolkit.多领域临床自然语言处理与 MedCAT：医学概念标注工具包。

Artif Intell Med. 2021 Jul;117:102083. doi: 10.1016/j.artmed.2021.102083. Epub 2021 May 1.

Interventional Radiology ex-machina: impact of Artificial Intelligence on practice.介入放射学的救星：人工智能对实践的影响。

Radiol Med. 2021 Jul;126(7):998-1006. doi: 10.1007/s11547-021-01351-x. Epub 2021 Apr 16.

Applications of artificial intelligence in cardiovascular imaging.人工智能在心血管成像中的应用。

Nat Rev Cardiol. 2021 Aug;18(8):600-609. doi: 10.1038/s41569-021-00527-2. Epub 2021 Mar 12.

Deep learning in generating radiology reports: A survey.深度学习在生成放射学报告中的应用：综述。

Artif Intell Med. 2020 Jun;106:101878. doi: 10.1016/j.artmed.2020.101878. Epub 2020 May 15.

MIMIC-CXR, a de-identified publicly available database of chest radiographs with free-text reports.MIMIC-CXR，一个去标识化的、公开可用的、包含自由文本报告的胸部 X 光数据库。

Sci Data. 2019 Dec 12;6(1):317. doi: 10.1038/s41597-019-0322-0.

NegBio: a high-performance tool for negation and uncertainty detection in radiology reports.NegBio：一种用于放射学报告中否定和不确定性检测的高性能工具。

AMIA Jt Summits Transl Sci Proc. 2018 May 18;2017:188-196. eCollection 2018.

Effective Pneumothorax Detection for Chest X-Ray Images Using Local Binary Pattern and Support Vector Machine.基于局部二值模式和支持向量机的胸部 X 射线图像气胸有效检测

J Healthc Eng. 2018 Apr 3;2018:2908517. doi: 10.1155/2018/2908517. eCollection 2018.

Long-Term Recurrent Convolutional Networks for Visual Recognition and Description.长期递归卷积网络的视觉识别与描述。

IEEE Trans Pattern Anal Mach Intell. 2017 Apr;39(4):677-691. doi: 10.1109/TPAMI.2016.2599174. Epub 2016 Sep 1.

文献检索

告别复杂PubMed语法，用中文像聊天一样搜索，搜遍4000万医学文献。AI智能推荐，让科研检索更轻松。

立即免费搜索

文件翻译

保留排版，准确专业，支持PDF/Word/PPT等文件格式，支持 12+语言互译。

免费翻译文档

深度研究

AI帮你快速写综述，25分钟生成高质量综述，智能提取关键信息，辅助科研写作。

立即免费体验

基于生成式预训练转换器的医学影像字幕生成。

Medical image captioning via generative pretrained transformers.

机构信息

出版信息

相似文献

引用本文的文献

本文引用的文献

文献检索

文件翻译

深度研究

Suppr 超能文献

相似文献

引用本文的文献

本文引用的文献