• 文献检索
  • 文档翻译
  • 深度研究
  • 学术资讯
  • Suppr Zotero 插件Zotero 插件
  • 邀请有礼
  • 套餐&价格
  • 历史记录
应用&插件
Suppr Zotero 插件Zotero 插件浏览器插件Mac 客户端Windows 客户端微信小程序
定价
高级版会员购买积分包购买API积分包
服务
文献检索文档翻译深度研究API 文档MCP 服务
关于我们
关于 Suppr公司介绍联系我们用户协议隐私条款
关注我们

Suppr 超能文献

核心技术专利:CN118964589B侵权必究
粤ICP备2023148730 号-1Suppr @ 2026

文献检索

告别复杂PubMed语法,用中文像聊天一样搜索,搜遍4000万医学文献。AI智能推荐,让科研检索更轻松。

立即免费搜索

文件翻译

保留排版,准确专业,支持PDF/Word/PPT等文件格式,支持 12+语言互译。

免费翻译文档

深度研究

AI帮你快速写综述,25分钟生成高质量综述,智能提取关键信息,辅助科研写作。

立即免费体验

交叉编解码器-解码器转换器与全局-局部视觉提取器用于医学图像字幕。

Cross Encoder-Decoder Transformer with Global-Local Visual Extractor for Medical Image Captioning.

机构信息

Department of Computer Science and Engineering, Dongguk University, Seoul 04620, Korea.

Department of Artificial Intelligence, Dongguk University, Seoul 04620, Korea.

出版信息

Sensors (Basel). 2022 Feb 13;22(4):1429. doi: 10.3390/s22041429.

DOI:10.3390/s22041429
PMID:35214330
原文链接:https://pmc.ncbi.nlm.nih.gov/articles/PMC8874388/
Abstract

Transformer-based approaches have shown good results in image captioning tasks. However, current approaches have a limitation in generating text from global features of an entire image. Therefore, we propose novel methods for generating better image captioning as follows: (1) The Global-Local Visual Extractor (GLVE) to capture both global features and local features. (2) The Cross Encoder-Decoder Transformer (CEDT) for injecting multiple-level encoder features into the decoding process. GLVE extracts not only global visual features that can be obtained from an entire image, such as size of organ or bone structure, but also local visual features that can be generated from a local region, such as lesion area. Given an image, CEDT can create a detailed description of the overall features by injecting both low-level and high-level encoder outputs into the decoder. Each method contributes to performance improvement and generates a description such as organ size and bone structure. The proposed model was evaluated on the IU X-ray dataset and achieved better performance than the transformer-based baseline results, by 5.6% in BLEU score, by 0.56% in METEOR, and by 1.98% in ROUGE-L.

摘要

基于转换器的方法在图像字幕任务中取得了很好的效果。然而,目前的方法在从整个图像的全局特征生成文本方面存在局限性。因此,我们提出了以下生成更好的图像字幕的新方法:(1)全局-局部视觉提取器(GLVE),用于捕获全局特征和局部特征。(2)交叉编码器-解码器转换器(CEDT),用于将多个级别的编码器特征注入到解码过程中。GLVE 不仅提取了可以从整个图像获得的全局视觉特征,例如器官或骨骼结构的大小,还提取了可以从局部区域生成的局部视觉特征,例如病变区域。对于给定的图像,CEDT 可以通过将低水平和高水平的编码器输出注入解码器来创建对整体特征的详细描述。每种方法都有助于提高性能,并生成诸如器官大小和骨骼结构之类的描述。所提出的模型在 IU X 射线数据集上进行了评估,与基于转换器的基线结果相比,BLEU 得分提高了 5.6%,METEOR 提高了 0.56%,ROUGE-L 提高了 1.98%。

https://cdn.ncbi.nlm.nih.gov/pmc/blobs/5793/8874388/b5da02101c7b/sensors-22-01429-g006.jpg
https://cdn.ncbi.nlm.nih.gov/pmc/blobs/5793/8874388/3604674b491e/sensors-22-01429-g001.jpg
https://cdn.ncbi.nlm.nih.gov/pmc/blobs/5793/8874388/8da4487a7ef5/sensors-22-01429-g002.jpg
https://cdn.ncbi.nlm.nih.gov/pmc/blobs/5793/8874388/91c2eb28989d/sensors-22-01429-g003.jpg
https://cdn.ncbi.nlm.nih.gov/pmc/blobs/5793/8874388/9f9300ae1d91/sensors-22-01429-g004.jpg
https://cdn.ncbi.nlm.nih.gov/pmc/blobs/5793/8874388/c1aa69aee044/sensors-22-01429-g005.jpg
https://cdn.ncbi.nlm.nih.gov/pmc/blobs/5793/8874388/b5da02101c7b/sensors-22-01429-g006.jpg
https://cdn.ncbi.nlm.nih.gov/pmc/blobs/5793/8874388/3604674b491e/sensors-22-01429-g001.jpg
https://cdn.ncbi.nlm.nih.gov/pmc/blobs/5793/8874388/8da4487a7ef5/sensors-22-01429-g002.jpg
https://cdn.ncbi.nlm.nih.gov/pmc/blobs/5793/8874388/91c2eb28989d/sensors-22-01429-g003.jpg
https://cdn.ncbi.nlm.nih.gov/pmc/blobs/5793/8874388/9f9300ae1d91/sensors-22-01429-g004.jpg
https://cdn.ncbi.nlm.nih.gov/pmc/blobs/5793/8874388/c1aa69aee044/sensors-22-01429-g005.jpg
https://cdn.ncbi.nlm.nih.gov/pmc/blobs/5793/8874388/b5da02101c7b/sensors-22-01429-g006.jpg

相似文献

1
Cross Encoder-Decoder Transformer with Global-Local Visual Extractor for Medical Image Captioning.交叉编解码器-解码器转换器与全局-局部视觉提取器用于医学图像字幕。
Sensors (Basel). 2022 Feb 13;22(4):1429. doi: 10.3390/s22041429.
2
Dual Global Enhanced Transformer for image captioning.双全局增强型 Transformer 用于图像字幕生成。
Neural Netw. 2022 Apr;148:129-141. doi: 10.1016/j.neunet.2022.01.011. Epub 2022 Jan 21.
3
Translating medical image to radiological report: Adaptive multilevel multi-attention approach.将医学图像翻译为放射报告:自适应多级多关注方法。
Comput Methods Programs Biomed. 2022 Jun;221:106853. doi: 10.1016/j.cmpb.2022.106853. Epub 2022 May 4.
4
Insights into Object Semantics: Leveraging Transformer Networks for Advanced Image Captioning.深入理解对象语义:利用Transformer网络实现高级图像字幕生成
Sensors (Basel). 2024 Mar 11;24(6):1796. doi: 10.3390/s24061796.
5
UAT: Universal Attention Transformer for Video Captioning.UAT:用于视频字幕的通用注意力转换器。
Sensors (Basel). 2022 Jun 25;22(13):4817. doi: 10.3390/s22134817.
6
Style-Enhanced Transformer for Image Captioning in Construction Scenes.用于建筑场景图像字幕的风格增强Transformer
Entropy (Basel). 2024 Mar 1;26(3):224. doi: 10.3390/e26030224.
7
Weakly Supervised Captioning of Ultrasound Images.超声图像的弱监督字幕生成
Med Image Underst Anal (2022). 2022 Jul;13413:187-198. doi: 10.1007/978-3-031-12053-4_14.
8
Effective Pre-Training Method and Its Compositional Intelligence for Image Captioning.用于图像字幕生成的有效预训练方法及其组合智能。
Sensors (Basel). 2022 Apr 30;22(9):3433. doi: 10.3390/s22093433.
9
Video captioning based on vision transformer and reinforcement learning.基于视觉Transformer和强化学习的视频字幕
PeerJ Comput Sci. 2022 Mar 16;8:e916. doi: 10.7717/peerj-cs.916. eCollection 2022.
10
Improving chest X-ray report generation by leveraging warm starting.利用热启动提高胸部 X 光报告生成
Artif Intell Med. 2023 Oct;144:102633. doi: 10.1016/j.artmed.2023.102633. Epub 2023 Aug 19.

引用本文的文献

1
Multi-view contrastive learning and symptom extraction insights for medical report generation.用于医学报告生成的多视图对比学习和症状提取见解
Sci Rep. 2025 May 23;15(1):17991. doi: 10.1038/s41598-025-00570-w.
2
ChestX-Transcribe: a multimodal transformer for automated radiology report generation from chest x-rays.胸部X光转录:一种用于从胸部X光自动生成放射学报告的多模态变换器。
Front Digit Health. 2025 Jan 21;7:1535168. doi: 10.3389/fdgth.2025.1535168. eCollection 2025.
3
Transforming Healthcare: Artificial Intelligence (AI) Applications in Medical Imaging and Drug Response Prediction.

本文引用的文献

1
Preparing a collection of radiology examinations for distribution and retrieval.准备一批用于分发和检索的放射学检查资料。
J Am Med Inform Assoc. 2016 Mar;23(2):304-10. doi: 10.1093/jamia/ocv080. Epub 2015 Jul 1.
变革医疗保健:人工智能在医学成像和药物反应预测中的应用
Genome Integr. 2025 Jan 22;15:e20240002. doi: 10.14293/genint.15.1.002. eCollection 2024.
4
XRaySwinGen: Automatic medical reporting for X-ray exams with multimodal model.XRaySwinGen:使用多模态模型进行X光检查的自动医学报告生成
Heliyon. 2024 Mar 12;10(7):e27516. doi: 10.1016/j.heliyon.2024.e27516. eCollection 2024 Apr 15.