Ji Zexuan, Chen Zheng, Ma Xiao
School of Computer Science and Engineering, Nanjing University of Science and Technology, Nanjing, 210094, China.
Sci Rep. 2025 Apr 1;15(1):11122. doi: 10.1038/s41598-025-95361-8.
Medical image segmentation plays a pivotal role in clinical diagnosis and pathological research by delineating regions of interest within medical images. While early approaches based on Convolutional Neural Networks (CNNs) have achieved significant success, their limited receptive field constrains their ability to capture long-range dependencies. Recent advances in Vision Transformers (ViTs) have demonstrated remarkable improvements by leveraging self-attention mechanisms. However, existing ViT-based segmentation models often struggle to effectively capture multi-scale variations within a single attention layer, limiting their capacity to model complex anatomical structures. To address this limitation, we propose Grouped Multi-Scale Attention (GMSA), which enhances multi-scale feature representation by grouping channels and performing self-attention at different scales within a single layer. Additionally, we introduce Inter-Scale Attention (ISA) to facilitate cross-scale feature fusion, further improving segmentation performance. Extensive experiments on the Synapse, ACDC, and ISIC2018 datasets demonstrate the effectiveness of our model, achieving state-of-the-art results in medical image segmentation. Our code is available at: https://github.com/Chen2zheng/ScaleFormer .
医学图像分割通过勾勒医学图像中的感兴趣区域,在临床诊断和病理研究中发挥着关键作用。虽然早期基于卷积神经网络(CNN)的方法取得了显著成功,但其有限的感受野限制了它们捕捉长程依赖关系的能力。视觉Transformer(ViT)的最新进展通过利用自注意力机制展现出了显著的改进。然而,现有的基于ViT的分割模型通常难以在单个注意力层内有效地捕捉多尺度变化,限制了它们对复杂解剖结构进行建模的能力。为了解决这一限制,我们提出了分组多尺度注意力(GMSA),它通过对通道进行分组并在单层内的不同尺度上执行自注意力来增强多尺度特征表示。此外,我们引入了跨尺度注意力(ISA)以促进跨尺度特征融合,进一步提高分割性能。在Synapse、ACDC和ISIC2018数据集上进行的大量实验证明了我们模型的有效性,在医学图像分割中取得了领先的结果。我们的代码可在以下网址获取:https://github.com/Chen2zheng/ScaleFormer 。