MLAgg-UNet：借助高效Transformer和受曼巴启发的多尺度序列推进医学图像分割

MLAgg-UNet: Advancing Medical Image Segmentation with Efficient Transformer and Mamba-Inspired Multi-Scale Sequence.

作者信息

Jiang Jiaxu, Lei Sen, Li HengChao, Sun Yongjian

出版信息

IEEE J Biomed Health Inform. 2025 Aug 7;PP. doi: 10.1109/JBHI.2025.3596648.

DOI:10.1109/JBHI.2025.3596648

Abstract

Transformers and state space sequence models (SSMs) have attracted interest in biomedical image segmentation for their ability to capture long-range dependency. However, traditional visual state space (VSS) methods suffer from the incompatibility of image tokens with autoregressive assumption. Although Transformer attention does not require this assumption, its high computational cost limits effective channel-wise information utilization. To overcome these limitations, we propose the Mamba-Like Aggregated UNet (MLAgg-UNet), which introduces Mamba-inspired mechanism to enrich Transformer channel representation and exploit implicit autoregressive characteristic within U-shaped architecture. For establishing dependencies among image tokens in single scale, the Mamba-Like Aggregated Attention (MLAgg) block is designed to balance representational ability and computational efficiency. Inspired by the human foveal vision system, Mamba macro-structure, and differential attention, MLAgg block can slide its focus over each image token, suppress irrelevant tokens, and simultaneously strengthen channel-wise information utilization. Moreover, leveraging causal relationships between consecutive low-level and high-level features in U-shaped architecture, we propose the Multi-Scale Mamba Module with Implicit Causality (MSMM) to optimize complementary information across scales. Embedded within skip connections, this module enhances semantic consistency between encoder and decoder features. Extensive experiments on four benchmark datasets, including AbdomenMRI, ACDC, BTCV, and EndoVis17, which cover MRI, CT, and endoscopy modalities, demonstrate that the proposed MLAgg-UNet consistently outperforms state-of-the-art CNN-based, Transformer-based, and Mamba-based methods. Specifically, it achieves improvements of at least 1.24%, 0.20%, 0.33%, and 0.39% in DSC scores on these datasets, respectively. These results highlight the model's ability to effectively capture feature correlations and integrate complementary multi-scale information, providing a robust solution for medical image segmentation. The implementation is publicly available at https://github.com/aticejiang/MLAgg-UNet.

摘要

变压器和状态空间序列模型（SSM）因其捕捉长程依赖关系的能力而在生物医学图像分割领域引起了关注。然而，传统的视觉状态空间（VSS）方法存在图像令牌与自回归假设不兼容的问题。虽然变压器注意力不需要这个假设，但其高计算成本限制了有效的通道级信息利用。为了克服这些限制，我们提出了类曼巴聚合U-Net（MLAgg-UNet），它引入了受曼巴启发的机制来丰富变压器通道表示，并在U形架构中利用隐式自回归特性。为了在单尺度上建立图像令牌之间的依赖关系，类曼巴聚合注意力（MLAgg）块被设计用于平衡表示能力和计算效率。受人类中央凹视觉系统、曼巴宏观结构和差分注意力的启发，MLAgg块可以将其注意力焦点滑过每个图像令牌，抑制无关令牌，同时加强通道级信息利用。此外，利用U形架构中连续的低级和高级特征之间的因果关系，我们提出了具有隐式因果关系的多尺度曼巴模块（MSMM）来优化跨尺度的互补信息。该模块嵌入在跳跃连接中，增强了编码器和解码器特征之间的语义一致性。在包括腹部MRI、ACDC、BTCV和EndoVis17在内的四个基准数据集上进行的广泛实验，这些数据集涵盖了MRI、CT和内窥镜检查模态，表明所提出的MLAgg-UNet始终优于基于卷积神经网络（CNN）、基于变压器和基于曼巴的现有方法。具体而言，它在这些数据集上的DSC分数分别提高了至少1.24%、0.20%、0.33%和0.39%。这些结果突出了该模型有效捕捉特征相关性和整合互补多尺度信息的能力，为医学图像分割提供了一个强大的解决方案。该实现可在https://github.com/aticejiang/MLAgg-UNet上公开获取。

Suppr 超能文献

文献检索

文件翻译

深度研究

Suppr 超能文献

文献检索

文件翻译

深度研究

MLAgg-UNet：借助高效Transformer和受曼巴启发的多尺度序列推进医学图像分割

MLAgg-UNet: Advancing Medical Image Segmentation with Efficient Transformer and Mamba-Inspired Multi-Scale Sequence.

作者信息

出版信息

相似文献

MLAgg-UNet：借助高效Transformer和受曼巴启发的多尺度序列推进医学图像分割

MLAgg-UNet: Advancing Medical Image Segmentation with Efficient Transformer and Mamba-Inspired Multi-Scale Sequence.

作者信息

出版信息

相似文献