• 文献检索
  • 文档翻译
  • 深度研究
  • 学术资讯
  • Suppr Zotero 插件Zotero 插件
  • 邀请有礼
  • 套餐&价格
  • 历史记录
应用&插件
Suppr Zotero 插件Zotero 插件浏览器插件Mac 客户端Windows 客户端微信小程序
定价
高级版会员购买积分包购买API积分包
服务
文献检索文档翻译深度研究API 文档MCP 服务
关于我们
关于 Suppr公司介绍联系我们用户协议隐私条款
关注我们

Suppr 超能文献

核心技术专利:CN118964589B侵权必究
粤ICP备2023148730 号-1Suppr @ 2026

文献检索

告别复杂PubMed语法,用中文像聊天一样搜索,搜遍4000万医学文献。AI智能推荐,让科研检索更轻松。

立即免费搜索

文件翻译

保留排版,准确专业,支持PDF/Word/PPT等文件格式,支持 12+语言互译。

免费翻译文档

深度研究

AI帮你快速写综述,25分钟生成高质量综述,智能提取关键信息,辅助科研写作。

立即免费体验

利用改进的Transformer架构理解胸部X光临床报告生成中的迁移学习。

Understanding transfer learning for chest radiograph clinical report generation with modified transformer architectures.

作者信息

Vendrow Edward, Schonfeld Ethan

机构信息

Department of Electrical Engineering and Computer Science, Massachusetts Institute of Technology, 50 Vassar St, Cambridge, MA, United States of America.

School of Medicine, Stanford University, 291 Campus Drive, Stanford, CA, United States of America.

出版信息

Heliyon. 2023 Jul 10;9(7):e17968. doi: 10.1016/j.heliyon.2023.e17968. eCollection 2023 Jul.

DOI:10.1016/j.heliyon.2023.e17968
PMID:37519756
原文链接:https://pmc.ncbi.nlm.nih.gov/articles/PMC10372225/
Abstract

The image captioning task is increasingly prevalent in artificial intelligence applications for medicine. One important application is clinical report generation from chest radiographs. The clinical writing of unstructured reports is time consuming and error-prone. An automated system would improve standardization, error reduction, time consumption, and medical accessibility. In this paper we demonstrate the importance of domain specific pre-training and propose a modified transformer architecture for the medical image captioning task. To accomplish this, we train a series of modified transformers to generate clinical reports from chest radiograph image input. These modified transformers include: a meshed-memory augmented transformer architecture with visual extractor using ImageNet pre-trained weights, a meshed-memory augmented transformer architecture with visual extractor using CheXpert pre-trained weights, and a meshed-memory augmented transformer whose encoder is passed the concatenated embeddings using both ImageNet pre-trained weights and CheXpert pre-trained weights. We use BLEU(1-4), ROUGE-L, CIDEr, and the clinical CheXbert F1 scores to validate our models and demonstrate competitive scores with state of the art models. We provide evidence that ImageNet pre-training is ill-suited for the medical image captioning task, especially for less frequent conditions (e.g.: enlarged cardiomediastinum, lung lesion, pneumothorax). Furthermore, we demonstrate that the double feature model improves performance for specific medical conditions (edema, consolidation, pneumothorax, support devices) and overall CheXbert F1 score, and should be further developed in future work. Such a double feature model, including both ImageNet pre-training as well as domain specific pre-training, could be used in a wide range of image captioning models in medicine.

摘要

图像字幕任务在医学人工智能应用中越来越普遍。一个重要的应用是根据胸部X光片生成临床报告。非结构化报告的临床撰写既耗时又容易出错。一个自动化系统将提高标准化程度、减少错误、节省时间并改善医疗可及性。在本文中,我们展示了特定领域预训练的重要性,并提出了一种用于医学图像字幕任务的改进型Transformer架构。为了实现这一目标,我们训练了一系列改进型Transformer,以便根据胸部X光片图像输入生成临床报告。这些改进型Transformer包括:一种带有视觉提取器的网格记忆增强Transformer架构,该视觉提取器使用ImageNet预训练权重;一种带有视觉提取器的网格记忆增强Transformer架构,该视觉提取器使用CheXpert预训练权重;以及一种网格记忆增强Transformer,其编码器被输入使用ImageNet预训练权重和CheXpert预训练权重的拼接嵌入。我们使用BLEU(1 - 4)、ROUGE - L、CIDEr和临床CheXbert F1分数来验证我们的模型,并展示与现有最先进模型相比具有竞争力的分数。我们提供证据表明,ImageNet预训练不适用于医学图像字幕任务,特别是对于不太常见的病症(例如:心影增大、肺部病变、气胸)。此外,我们证明了双特征模型在特定医学病症(水肿、实变、气胸、支撑装置)和整体CheXbert F1分数方面提高了性能,并且在未来的工作中应进一步开发。这样一种包括ImageNet预训练以及特定领域预训练的双特征模型,可以用于医学领域的广泛图像字幕模型中。

https://cdn.ncbi.nlm.nih.gov/pmc/blobs/0ae7/10372225/bb505c04613d/gr002.jpg
https://cdn.ncbi.nlm.nih.gov/pmc/blobs/0ae7/10372225/e07182f09906/gr001.jpg
https://cdn.ncbi.nlm.nih.gov/pmc/blobs/0ae7/10372225/bb505c04613d/gr002.jpg
https://cdn.ncbi.nlm.nih.gov/pmc/blobs/0ae7/10372225/e07182f09906/gr001.jpg
https://cdn.ncbi.nlm.nih.gov/pmc/blobs/0ae7/10372225/bb505c04613d/gr002.jpg

相似文献

1
Understanding transfer learning for chest radiograph clinical report generation with modified transformer architectures.利用改进的Transformer架构理解胸部X光临床报告生成中的迁移学习。
Heliyon. 2023 Jul 10;9(7):e17968. doi: 10.1016/j.heliyon.2023.e17968. eCollection 2023 Jul.
2
Multi-modal transformer architecture for medical image analysis and automated report generation.多模态转换器架构在医学图像分析和自动报告生成中的应用。
Sci Rep. 2024 Aug 20;14(1):19281. doi: 10.1038/s41598-024-69981-5.
3
Effective Pre-Training Method and Its Compositional Intelligence for Image Captioning.用于图像字幕生成的有效预训练方法及其组合智能。
Sensors (Basel). 2022 Apr 30;22(9):3433. doi: 10.3390/s22093433.
4
Analyzing Transfer Learning of Vision Transformers for Interpreting Chest Radiography.分析视觉Transformer在解读胸部 X 光片方面的迁移学习。
J Digit Imaging. 2022 Dec;35(6):1445-1462. doi: 10.1007/s10278-022-00666-z. Epub 2022 Jul 11.
5
Cross Encoder-Decoder Transformer with Global-Local Visual Extractor for Medical Image Captioning.交叉编解码器-解码器转换器与全局-局部视觉提取器用于医学图像字幕。
Sensors (Basel). 2022 Feb 13;22(4):1429. doi: 10.3390/s22041429.
6
Translating medical image to radiological report: Adaptive multilevel multi-attention approach.将医学图像翻译为放射报告:自适应多级多关注方法。
Comput Methods Programs Biomed. 2022 Jun;221:106853. doi: 10.1016/j.cmpb.2022.106853. Epub 2022 May 4.
7
Improving chest X-ray report generation by leveraging warm starting.利用热启动提高胸部 X 光报告生成
Artif Intell Med. 2023 Oct;144:102633. doi: 10.1016/j.artmed.2023.102633. Epub 2023 Aug 19.
8
CSAMDT: Conditional Self Attention Memory-Driven Transformers for Radiology Report Generation from Chest X-Ray.CSAMDT:用于从胸部X光生成放射学报告的条件自注意力记忆驱动变压器
J Imaging Inform Med. 2024 Dec;37(6):2825-2837. doi: 10.1007/s10278-024-01126-6. Epub 2024 Jun 3.
9
A comparative study on deep learning models for text classification of unstructured medical notes with various levels of class imbalance.深度学习模型在不同类别不平衡程度的非结构化医疗记录文本分类中的对比研究。
BMC Med Res Methodol. 2022 Jul 2;22(1):181. doi: 10.1186/s12874-022-01665-y.
10
Contrastive pre-training and linear interaction attention-based transformer for universal medical reports generation.用于通用医学报告生成的对比预训练和基于线性交互注意力的变压器
J Biomed Inform. 2023 Feb;138:104281. doi: 10.1016/j.jbi.2023.104281. Epub 2023 Jan 10.

本文引用的文献

1
Multi-Modal Understanding and Generation for Medical Images and Text via Vision-Language Pre-Training.通过视觉语言预训练实现医学图像与文本的多模态理解与生成
IEEE J Biomed Health Inform. 2022 Dec;26(12):6070-6080. doi: 10.1109/JBHI.2022.3207502. Epub 2022 Dec 7.
2
MIMIC-CXR, a de-identified publicly available database of chest radiographs with free-text reports.MIMIC-CXR,一个去标识化的、公开可用的、包含自由文本报告的胸部 X 光数据库。
Sci Data. 2019 Dec 12;6(1):317. doi: 10.1038/s41597-019-0322-0.