School of Computer Science and Engineering, Northeastern University, Shenyang, China; Key Laboratory of Intelligent Computing in Medical Image of Ministry of Education, Northeastern University, Shenyang, China.
School of Computer Science and Engineering, Northeastern University, Shenyang, China; Key Laboratory of Intelligent Computing in Medical Image of Ministry of Education, Northeastern University, Shenyang, China.
Neural Netw. 2024 Oct;178:106546. doi: 10.1016/j.neunet.2024.106546. Epub 2024 Jul 17.
Current state-of-the-art medical image segmentation techniques predominantly employ the encoder-decoder architecture. Despite its widespread use, this U-shaped framework exhibits limitations in effectively capturing multi-scale features through simple skip connections. In this study, we made a thorough analysis to investigate the potential weaknesses of connections across various segmentation tasks, and suggest two key aspects of potential semantic gaps crucial to be considered: the semantic gap among multi-scale features in different encoding stages and the semantic gap between the encoder and the decoder. To bridge these semantic gaps, we introduce a novel segmentation framework, which incorporates a Dual Attention Transformer module for capturing channel-wise and spatial-wise relationships, and a Decoder-guided Recalibration Attention module for fusing DAT tokens and decoder features. These modules establish a principle of learnable connection that resolves the semantic gaps, leading to a high-performance segmentation model for medical images. Furthermore, it provides a new paradigm for effectively incorporating the attention mechanism into the traditional convolution-based architecture. Comprehensive experimental results demonstrate that our model achieves consistent, significant gains and outperforms state-of-the-art methods with relatively fewer parameters. This study contributes to the advancement of medical image segmentation by offering a more effective and efficient framework for addressing the limitations of current encoder-decoder architectures. Code: https://github.com/McGregorWwww/UDTransNet.
当前最先进的医学图像分割技术主要采用编码器-解码器架构。尽管这种 U 型框架被广泛应用,但它通过简单的跳过连接来有效捕获多尺度特征的能力有限。在这项研究中,我们进行了全面的分析,研究了跨各种分割任务的连接的潜在弱点,并提出了两个需要考虑的关键潜在语义差距方面:不同编码阶段的多尺度特征之间的语义差距,以及编码器和解码器之间的语义差距。为了弥合这些语义差距,我们引入了一种新的分割框架,该框架包含一个双注意转换器模块,用于捕获通道和空间关系,以及一个解码器引导的再校准注意模块,用于融合 DAT 令牌和解码器特征。这些模块建立了一个可学习连接的原则,解决了语义差距问题,为医学图像提供了一个高性能的分割模型。此外,它为有效地将注意力机制融入传统的基于卷积的架构提供了一个新的范例。全面的实验结果表明,我们的模型在具有相对较少参数的情况下实现了一致的、显著的增益,并优于最先进的方法。这项研究通过提供一个更有效和高效的框架来解决当前编码器-解码器架构的局限性,为医学图像分割的发展做出了贡献。代码:https://github.com/McGregorWwww/UDTransNet。