Suppr超能文献

用于医学图像分割的文本辅助视觉模型。

Text-Assisted Vision Model for Medical Image Segmentation.

作者信息

Rahman Md Motiur, Rahman Saeka, Bhatt Smriti, Faezipour Miad

出版信息

IEEE J Biomed Health Inform. 2025 May 13;PP. doi: 10.1109/JBHI.2025.3569491.

Abstract

Precise medical image segmentation is important for automating diagnosis and treatment planning in healthcare. While images present the most significant information for segmenting organs using deep learning models, text reports also provide complementary details that can be leveraged to improve segmentation precision. Performance improvement depends on the proper utilization of text reports and the corresponding images. Most attention modules focus on single-modality computation of spatial, channel, or pixel-level attention. They are ineffective in cross-modal alignment, raising issues in multi-modal scenarios. This study addresses these gaps by presenting a text-assisted vision (TAV) model for medical image segmentation with a novel attention computation module named triguided attention module (TGAM). TGAM computes visual-visual, language-language, and language-visual attention, enabling the model to understand the important features and correlation between images and medical notes. This module helps the model identify the relevant features within images, text annotations, and text annotations to visual interactions. We incorporate an attention gate (AG) that modulates the influence of TGAM, ensuring it does not overflow the encoded features with irrelevant or redundant information, while maintaining their uniqueness. We evaluated the performance of TAV on two popular datasets containing images and corresponding text annotations. We find TAV to be a new state-of-the-art model, as it improves the performance by 2-7% compared to other models. Extensive experiments were performed to demonstrate the effectiveness of each component of the proposed model. The code and datasets are available on Github.

摘要

精确的医学图像分割对于医疗保健中的自动化诊断和治疗规划至关重要。虽然图像提供了使用深度学习模型分割器官的最重要信息,但文本报告也提供了可用于提高分割精度的补充细节。性能提升取决于对文本报告和相应图像的合理利用。大多数注意力模块专注于空间、通道或像素级注意力的单模态计算。它们在跨模态对齐方面效率低下,在多模态场景中引发问题。本研究通过提出一种用于医学图像分割的文本辅助视觉(TAV)模型来解决这些差距,该模型具有一个名为三引导注意力模块(TGAM)的新型注意力计算模块。TGAM计算视觉-视觉、语言-语言和语言-视觉注意力,使模型能够理解图像和医学记录之间的重要特征及相关性。该模块帮助模型识别图像、文本注释以及文本注释与视觉交互中的相关特征。我们纳入了一个注意力门(AG)来调节TGAM的影响,确保它不会用无关或冗余信息淹没编码特征,同时保持其独特性。我们在两个包含图像和相应文本注释的流行数据集上评估了TAV的性能。我们发现TAV是一个新的最先进模型,因为与其他模型相比,它将性能提高了2 - 7%。进行了广泛的实验来证明所提出模型各组件的有效性。代码和数据集可在Github上获取。

文献检索

告别复杂PubMed语法,用中文像聊天一样搜索,搜遍4000万医学文献。AI智能推荐,让科研检索更轻松。

立即免费搜索

文件翻译

保留排版,准确专业,支持PDF/Word/PPT等文件格式,支持 12+语言互译。

免费翻译文档

深度研究

AI帮你快速写综述,25分钟生成高质量综述,智能提取关键信息,辅助科研写作。

立即免费体验