Suppr超能文献

基于记忆驱动的多模态医学影像报告自动生成研究

[Research on automatic generation of multimodal medical image reports based on memory driven].

作者信息

Xing Suxia, Fang Junze, Ju Zihan, Guo Zheng, Wang Yu

机构信息

School of Artificial Intelligence, Beijng Technology and Business University, Beijng 100048, P. R. China.

出版信息

Sheng Wu Yi Xue Gong Cheng Xue Za Zhi. 2024 Feb 25;41(1):60-69. doi: 10.7507/1001-5515.202304001.

Abstract

The task of automatic generation of medical image reports faces various challenges, such as diverse types of diseases and a lack of professionalism and fluency in report descriptions. To address these issues, this paper proposes a multimodal medical imaging report based on memory drive method (mMIRmd). Firstly, a hierarchical vision transformer using shifted windows (Swin-Transformer) is utilized to extract multi-perspective visual features of patient medical images, and semantic features of textual medical history information are extracted using bidirectional encoder representations from transformers (BERT). Subsequently, the visual and semantic features are integrated to enhance the model's ability to recognize different disease types. Furthermore, a medical text pre-trained word vector dictionary is employed to encode labels of visual features, thereby enhancing the professionalism of the generated reports. Finally, a memory driven module is introduced in the decoder, addressing long-distance dependencies in medical image data. This study is validated on the chest X-ray dataset collected at Indiana University (IU X-Ray) and the medical information mart for intensive care chest x-ray (MIMIC-CXR) released by the Massachusetts Institute of Technology and Massachusetts General Hospital. Experimental results indicate that the proposed method can better focus on the affected areas, improve the accuracy and fluency of report generation, and assist radiologists in quickly completing medical image report writing.

摘要

医学图像报告的自动生成任务面临着各种挑战,例如疾病类型多样以及报告描述缺乏专业性和流畅性。为了解决这些问题,本文提出了一种基于内存驱动方法的多模态医学成像报告(mMIRmd)。首先,使用基于移位窗口的分层视觉Transformer(Swin-Transformer)来提取患者医学图像的多视角视觉特征,并使用来自Transformer的双向编码器表示(BERT)提取文本病史信息的语义特征。随后,将视觉和语义特征进行整合,以增强模型识别不同疾病类型的能力。此外,使用医学文本预训练词向量字典对视觉特征标签进行编码,从而提高生成报告的专业性。最后,在解码器中引入内存驱动模块,解决医学图像数据中的长距离依赖问题。本研究在印第安纳大学收集的胸部X线数据集(IU X-Ray)以及麻省理工学院和麻省总医院发布的重症监护胸部X线医学信息集市(MIMIC-CXR)上进行了验证。实验结果表明,所提出的方法能够更好地聚焦于受影响区域,提高报告生成的准确性和流畅性,并协助放射科医生快速完成医学图像报告的撰写。

相似文献

3
A label information fused medical image report generation framework.一种融合标签信息的医学图像报告生成框架。
Artif Intell Med. 2024 Apr;150:102823. doi: 10.1016/j.artmed.2024.102823. Epub 2024 Feb 22.
8
Dual encoder network with transformer-CNN for multi-organ segmentation.基于 Transformer-CNN 的双编码器网络的多器官分割。
Med Biol Eng Comput. 2023 Mar;61(3):661-671. doi: 10.1007/s11517-022-02723-9. Epub 2022 Dec 29.

本文引用的文献

3
A Survey on Vision Transformer.视觉Transformer综述
IEEE Trans Pattern Anal Mach Intell. 2023 Jan;45(1):87-110. doi: 10.1109/TPAMI.2022.3152247. Epub 2022 Dec 5.

文献AI研究员

20分钟写一篇综述,助力文献阅读效率提升50倍。

立即体验

用中文搜PubMed

大模型驱动的PubMed中文搜索引擎

马上搜索

文档翻译

学术文献翻译模型,支持多种主流文档格式。

立即体验