基于记忆驱动的多模态医学影像报告自动生成研究

[Research on automatic generation of multimodal medical image reports based on memory driven].

作者信息

Xing Suxia, Fang Junze, Ju Zihan, Guo Zheng, Wang Yu

机构信息

School of Artificial Intelligence, Beijng Technology and Business University, Beijng 100048, P. R. China.

出版信息

Sheng Wu Yi Xue Gong Cheng Xue Za Zhi. 2024 Feb 25;41(1):60-69. doi: 10.7507/1001-5515.202304001.

DOI:10.7507/1001-5515.202304001

PMID:38403605

原文链接:https://pmc.ncbi.nlm.nih.gov/articles/PMC10894734/

Abstract

The task of automatic generation of medical image reports faces various challenges, such as diverse types of diseases and a lack of professionalism and fluency in report descriptions. To address these issues, this paper proposes a multimodal medical imaging report based on memory drive method (mMIRmd). Firstly, a hierarchical vision transformer using shifted windows (Swin-Transformer) is utilized to extract multi-perspective visual features of patient medical images, and semantic features of textual medical history information are extracted using bidirectional encoder representations from transformers (BERT). Subsequently, the visual and semantic features are integrated to enhance the model's ability to recognize different disease types. Furthermore, a medical text pre-trained word vector dictionary is employed to encode labels of visual features, thereby enhancing the professionalism of the generated reports. Finally, a memory driven module is introduced in the decoder, addressing long-distance dependencies in medical image data. This study is validated on the chest X-ray dataset collected at Indiana University (IU X-Ray) and the medical information mart for intensive care chest x-ray (MIMIC-CXR) released by the Massachusetts Institute of Technology and Massachusetts General Hospital. Experimental results indicate that the proposed method can better focus on the affected areas, improve the accuracy and fluency of report generation, and assist radiologists in quickly completing medical image report writing.

摘要

医学图像报告的自动生成任务面临着各种挑战，例如疾病类型多样以及报告描述缺乏专业性和流畅性。为了解决这些问题，本文提出了一种基于内存驱动方法的多模态医学成像报告（mMIRmd）。首先，使用基于移位窗口的分层视觉Transformer（Swin-Transformer）来提取患者医学图像的多视角视觉特征，并使用来自Transformer的双向编码器表示（BERT）提取文本病史信息的语义特征。随后，将视觉和语义特征进行整合，以增强模型识别不同疾病类型的能力。此外，使用医学文本预训练词向量字典对视觉特征标签进行编码，从而提高生成报告的专业性。最后，在解码器中引入内存驱动模块，解决医学图像数据中的长距离依赖问题。本研究在印第安纳大学收集的胸部X线数据集（IU X-Ray）以及麻省理工学院和麻省总医院发布的重症监护胸部X线医学信息集市（MIMIC-CXR）上进行了验证。实验结果表明，所提出的方法能够更好地聚焦于受影响区域，提高报告生成的准确性和流畅性，并协助放射科医生快速完成医学图像报告的撰写。

相似文献

[Research on automatic generation of multimodal medical image reports based on memory driven].基于记忆驱动的多模态医学影像报告自动生成研究

Sheng Wu Yi Xue Gong Cheng Xue Za Zhi. 2024 Feb 25;41(1):60-69. doi: 10.7507/1001-5515.202304001.

Translating medical image to radiological report: Adaptive multilevel multi-attention approach.将医学图像翻译为放射报告：自适应多级多关注方法。

Comput Methods Programs Biomed. 2022 Jun;221:106853. doi: 10.1016/j.cmpb.2022.106853. Epub 2022 May 4.

A label information fused medical image report generation framework.一种融合标签信息的医学图像报告生成框架。

Artif Intell Med. 2024 Apr;150:102823. doi: 10.1016/j.artmed.2024.102823. Epub 2024 Feb 22.

Multi-modal transformer architecture for medical image analysis and automated report generation.多模态转换器架构在医学图像分析和自动报告生成中的应用。

Sci Rep. 2024 Aug 20;14(1):19281. doi: 10.1038/s41598-024-69981-5.

Memory Guided Transformer With Spatio-Semantic Visual Extractor for Medical Report Generation.用于医学报告生成的具有时空语义视觉提取器的记忆引导变换器

IEEE J Biomed Health Inform. 2024 May;28(5):3079-3089. doi: 10.1109/JBHI.2024.3371894. Epub 2024 May 6.

CPFTransformer: transformer fusion context pyramid medical image segmentation network.CPFTransformer：变换器融合上下文金字塔医学图像分割网络。

Front Neurosci. 2023 Dec 7;17:1288366. doi: 10.3389/fnins.2023.1288366. eCollection 2023.

Intensive vision-guided network for radiology report generation.基于密集视觉引导的放射科报告生成网络。

Phys Med Biol. 2024 Feb 5;69(4). doi: 10.1088/1361-6560/ad1995.

Dual encoder network with transformer-CNN for multi-organ segmentation.基于 Transformer-CNN 的双编码器网络的多器官分割。

Med Biol Eng Comput. 2023 Mar;61(3):661-671. doi: 10.1007/s11517-022-02723-9. Epub 2022 Dec 29.

Chinese Clinical Named Entity Recognition From Electronic Medical Records Based on Multisemantic Features by Using Robustly Optimized Bidirectional Encoder Representation From Transformers Pretraining Approach Whole Word Masking and Convolutional Neural Networks: Model Development and Validation.基于多语义特征，利用经过稳健优化的基于变换器预训练方法的全词掩码和卷积神经网络从电子病历中进行中文临床命名实体识别：模型开发与验证

JMIR Med Inform. 2023 May 10;11:e44597. doi: 10.2196/44597.

RadioBERT: A deep learning-based system for medical report generation from chest X-ray images using contextual embeddings.RadioBERT：一种基于深度学习的系统，用于使用上下文嵌入从胸部 X 光图像生成医学报告。

J Biomed Inform. 2022 Nov;135:104220. doi: 10.1016/j.jbi.2022.104220. Epub 2022 Oct 10.

本文引用的文献

Radiology report generation with a learned knowledge base and multi-modal alignment.基于学习知识库和多模态对齐的放射学报告生成

Med Image Anal. 2023 May;86:102798. doi: 10.1016/j.media.2023.102798. Epub 2023 Mar 23.

Towards Transfer Learning Techniques-BERT, DistilBERT, BERTimbau, and DistilBERTimbau for Automatic Text Classification from Different Languages: A Case Study.面向迁移学习技术——BERT、DistilBERT、BERTimbau 和 DistilBERTimbau 用于来自不同语言的自动文本分类：案例研究。

Sensors (Basel). 2022 Oct 26;22(21):8184. doi: 10.3390/s22218184.

A Survey on Vision Transformer.视觉Transformer综述

IEEE Trans Pattern Anal Mach Intell. 2023 Jan;45(1):87-110. doi: 10.1109/TPAMI.2022.3152247. Epub 2022 Dec 5.

MIMIC-CXR, a de-identified publicly available database of chest radiographs with free-text reports.MIMIC-CXR，一个去标识化的、公开可用的、包含自由文本报告的胸部 X 光数据库。

Sci Data. 2019 Dec 12;6(1):317. doi: 10.1038/s41597-019-0322-0.

BioBERT: a pre-trained biomedical language representation model for biomedical text mining.BioBERT：一种用于生物医学文本挖掘的预训练生物医学语言表示模型。

Bioinformatics. 2020 Feb 15;36(4):1234-1240. doi: 10.1093/bioinformatics/btz682.

Preparing a collection of radiology examinations for distribution and retrieval.准备一批用于分发和检索的放射学检查资料。

J Am Med Inform Assoc. 2016 Mar;23(2):304-10. doi: 10.1093/jamia/ocv080. Epub 2015 Jul 1.

文献AI研究员

20分钟写一篇综述，助力文献阅读效率提升50倍。

立即体验

用中文搜PubMed

大模型驱动的PubMed中文搜索引擎

马上搜索

文档翻译

学术文献翻译模型，支持多种主流文档格式。