用于医学图像分割的文本辅助视觉模型。

Text-Assisted Vision Model for Medical Image Segmentation.

作者信息

Rahman Md Motiur, Rahman Saeka, Bhatt Smriti, Faezipour Miad

出版信息

IEEE J Biomed Health Inform. 2025 May 13;PP. doi: 10.1109/JBHI.2025.3569491.

DOI:10.1109/JBHI.2025.3569491

Abstract

Precise medical image segmentation is important for automating diagnosis and treatment planning in healthcare. While images present the most significant information for segmenting organs using deep learning models, text reports also provide complementary details that can be leveraged to improve segmentation precision. Performance improvement depends on the proper utilization of text reports and the corresponding images. Most attention modules focus on single-modality computation of spatial, channel, or pixel-level attention. They are ineffective in cross-modal alignment, raising issues in multi-modal scenarios. This study addresses these gaps by presenting a text-assisted vision (TAV) model for medical image segmentation with a novel attention computation module named triguided attention module (TGAM). TGAM computes visual-visual, language-language, and language-visual attention, enabling the model to understand the important features and correlation between images and medical notes. This module helps the model identify the relevant features within images, text annotations, and text annotations to visual interactions. We incorporate an attention gate (AG) that modulates the influence of TGAM, ensuring it does not overflow the encoded features with irrelevant or redundant information, while maintaining their uniqueness. We evaluated the performance of TAV on two popular datasets containing images and corresponding text annotations. We find TAV to be a new state-of-the-art model, as it improves the performance by 2-7% compared to other models. Extensive experiments were performed to demonstrate the effectiveness of each component of the proposed model. The code and datasets are available on Github.

摘要

精确的医学图像分割对于医疗保健中的自动化诊断和治疗规划至关重要。虽然图像提供了使用深度学习模型分割器官的最重要信息，但文本报告也提供了可用于提高分割精度的补充细节。性能提升取决于对文本报告和相应图像的合理利用。大多数注意力模块专注于空间、通道或像素级注意力的单模态计算。它们在跨模态对齐方面效率低下，在多模态场景中引发问题。本研究通过提出一种用于医学图像分割的文本辅助视觉（TAV）模型来解决这些差距，该模型具有一个名为三引导注意力模块（TGAM）的新型注意力计算模块。TGAM计算视觉-视觉、语言-语言和语言-视觉注意力，使模型能够理解图像和医学记录之间的重要特征及相关性。该模块帮助模型识别图像、文本注释以及文本注释与视觉交互中的相关特征。我们纳入了一个注意力门（AG）来调节TGAM的影响，确保它不会用无关或冗余信息淹没编码特征，同时保持其独特性。我们在两个包含图像和相应文本注释的流行数据集上评估了TAV的性能。我们发现TAV是一个新的最先进模型，因为与其他模型相比，它将性能提高了2 - 7%。进行了广泛的实验来证明所提出模型各组件的有效性。代码和数据集可在Github上获取。

相似文献

Text-Assisted Vision Model for Medical Image Segmentation.用于医学图像分割的文本辅助视觉模型。

IEEE J Biomed Health Inform. 2025 May 13;PP. doi: 10.1109/JBHI.2025.3569491.

SwinCross: Cross-modal Swin transformer for head-and-neck tumor segmentation in PET/CT images.SwinCross：用于 PET/CT 图像中头颈部肿瘤分割的跨模态 Swin 变换器。

Med Phys. 2024 Mar;51(3):2096-2107. doi: 10.1002/mp.16703. Epub 2023 Sep 30.

A modality-collaborative convolution and transformer hybrid network for unpaired multi-modal medical image segmentation with limited annotations.一种用于具有有限标注的未配对多模态医学图像分割的模态协作卷积与Transformer混合网络。

Med Phys. 2023 Sep;50(9):5460-5478. doi: 10.1002/mp.16338. Epub 2023 Mar 15.

LViT: Language Meets Vision Transformer in Medical Image Segmentation.LViT：医学图像分割中语言与视觉Transformer的融合

IEEE Trans Med Imaging. 2024 Jan;43(1):96-107. doi: 10.1109/TMI.2023.3291719. Epub 2024 Jan 2.

MCPL: Multi-Modal Collaborative Prompt Learning for Medical Vision-Language Model.MCPL：用于医学视觉语言模型的多模态协作提示学习

IEEE Trans Med Imaging. 2024 Dec;43(12):4224-4235. doi: 10.1109/TMI.2024.3418408. Epub 2024 Dec 2.

ETUNet:Exploring efficient transformer enhanced UNet for 3D brain tumor segmentation.ETUNet：探索高效的基于Transformer 的增强型 UNet 进行 3D 脑肿瘤分割。

Comput Biol Med. 2024 Mar;171:108005. doi: 10.1016/j.compbiomed.2024.108005. Epub 2024 Jan 23.

A mutual inclusion mechanism for precise boundary segmentation in medical images.一种用于医学图像精确边界分割的相互包含机制。

Front Bioeng Biotechnol. 2024 Dec 24;12:1504249. doi: 10.3389/fbioe.2024.1504249. eCollection 2024.

Semi-supervised multi-modal medical image segmentation with unified translation.基于统一翻译的半监督多模态医学图像分割

Comput Biol Med. 2024 Jun;176:108570. doi: 10.1016/j.compbiomed.2024.108570. Epub 2024 May 8.

MSRA-Net: Tumor segmentation network based on Multi-scale Residual Attention.MSRA-Net：基于多尺度残差注意力的肿瘤分割网络。

Comput Biol Med. 2023 May;158:106818. doi: 10.1016/j.compbiomed.2023.106818. Epub 2023 Mar 22.

Multi-modality self-attention aware deep network for 3D biomedical segmentation.多模态自注意力感知深度网络用于 3D 生物医学分割。

BMC Med Inform Decis Mak. 2020 Jul 9;20(Suppl 3):119. doi: 10.1186/s12911-020-1109-0.

引用本文的文献

SMF-net: semantic-guided multimodal fusion network for precise pancreatic tumor segmentation in medical CT image.SMF-net：用于医学CT图像中精确胰腺肿瘤分割的语义引导多模态融合网络

Front Oncol. 2025 Jul 18;15:1622426. doi: 10.3389/fonc.2025.1622426. eCollection 2025.

文献检索

告别复杂PubMed语法，用中文像聊天一样搜索，搜遍4000万医学文献。AI智能推荐，让科研检索更轻松。

立即免费搜索

文件翻译

保留排版，准确专业，支持PDF/Word/PPT等文件格式，支持 12+语言互译。

免费翻译文档

深度研究

AI帮你快速写综述，25分钟生成高质量综述，智能提取关键信息，辅助科研写作。

立即免费体验

用于医学图像分割的文本辅助视觉模型。

Text-Assisted Vision Model for Medical Image Segmentation.

作者信息

出版信息

相似文献

引用本文的文献

文献检索

文件翻译

深度研究

Suppr 超能文献

相似文献

引用本文的文献