Jin Yu, Tian Rui, Yu Qian, Bai Yu, Chao Guoqing, Liu Danqing, Guo Yanhui
School of Computer Science, Qinghai Normal University, Xining, China.
School of Data and Computer Science, Shandong Women's University, Jinan, China.
Quant Imaging Med Surg. 2025 Apr 1;15(4):3064-3083. doi: 10.21037/qims-24-1983. Epub 2025 Mar 28.
Pixel-level medical image segmentation tasks are challenging due to factors such as variable target scales, complex geometric shapes, and low contrast. Although U-shaped hybrid networks have demonstrated strong performance, existing models often fail to effectively integrate the local features captured by convolutional neural networks (CNNs) with the global features provided by Transformers. Moreover, their self-attention mechanisms often lack adequate emphasis on critical spatial and channel information. To address these challenges, our goal was to develop a hybrid deep learning model that can effectively and robustly segment medical images, including but not limited to computed tomography (CT) and magnetic resonance (MR) images.
We propose an effective hybrid U-shaped network, named the effective multi-scale context aggregation hybrid network (EMCAH-Net). It integrates an effective multi-scale context aggregation (EMCA) block in the backbone, along with a dual-attention augmented self-attention (DASA) block embedded in the skip connections and bottleneck layers. Aimed at the characteristics of medical images, the former block focuses on fine-grained local multi-scale feature encoding, whereas the latter enhances global representation learning by adaptively combining spatial and channel attention with self-attention. This approach not only effectively integrates local multi-scale and global features but also reinforces skip connections, thereby highlighting segmentation targets and precisely delineating boundaries. The code is publicly available at https://github.com/AloneIsland/EMCAH-Net.
Compared to previous state-of-the-art (SOTA) methods, the EMCAH-Net achieves outstanding performance in medical image segmentation, with Dice similarity coefficient (DSC) scores of 84.73% (+2.85), 92.33% (+0.27), and 82.47% (+0.76) on the Synapse, automated cardiac diagnosis challenge (ACDC), and digital retinal images for vessel extraction (DRIVE) datasets, respectively. Additionally, it maintains computational efficiency in terms of model parameters and floating point operations (FLOPs). For instance, EMCAH-Net surpasses TransUNet on the Synapse dataset by 7.25% in DSC while requiring only 25% of the parameters and 71% of the FLOPs.
EMCAH-Net has demonstrated significant advantages in segmenting multi-scale, small, and boundary-blurred features in medical images. Extensive experiments on abdominal multi-organ, cardiac, and retinal vessel medical segmentation tasks confirm that EMCAH-Net surpasses previous methods, including pure CNN, pure Transformer, and hybrid architectures.
由于目标尺度变化、几何形状复杂和对比度低等因素,像素级医学图像分割任务具有挑战性。尽管U型混合网络已展现出强大的性能,但现有模型往往无法有效地将卷积神经网络(CNN)捕获的局部特征与Transformer提供的全局特征进行整合。此外,它们的自注意力机制通常对关键的空间和通道信息缺乏足够的重视。为应对这些挑战,我们的目标是开发一种能够有效且稳健地分割医学图像的混合深度学习模型,包括但不限于计算机断层扫描(CT)和磁共振(MR)图像。
我们提出了一种有效的混合U型网络,名为有效多尺度上下文聚合混合网络(EMCAH-Net)。它在主干中集成了一个有效多尺度上下文聚合(EMCA)模块,以及嵌入在跳跃连接和瓶颈层中的双注意力增强自注意力(DASA)模块。针对医学图像的特点,前一个模块专注于细粒度的局部多尺度特征编码,而后一个模块通过将空间注意力和通道注意力与自注意力自适应地结合来增强全局表示学习。这种方法不仅有效地整合了局部多尺度和全局特征,还强化了跳跃连接,从而突出分割目标并精确勾勒边界。代码可在https://github.com/AloneIsland/EMCAH-Net上公开获取。
与先前的最先进(SOTA)方法相比,EMCAH-Net在医学图像分割中取得了出色的性能,在Synapse、自动心脏诊断挑战赛(ACDC)和用于血管提取的数字视网膜图像(DRIVE)数据集上的Dice相似系数(DSC)得分分别为84.73%(+2.85)、92.33%(+0.27)和82.47%(+0.76)。此外,它在模型参数和浮点运算(FLOPs)方面保持了计算效率。例如,在Synapse数据集上,EMCAH-Net的DSC比TransUNet高出7.25%,而所需参数仅为其25%,FLOPs仅为其71%。
EMCAH-Net在分割医学图像中的多尺度、小尺寸和边界模糊特征方面展现出显著优势。在腹部多器官、心脏和视网膜血管医学分割任务上的大量实验证实,EMCAH-Net优于先前的方法,包括纯CNN、纯Transformer和混合架构。