通过先进的解码器设计优化基于Transformer的网络用于医学图像分割。

Optimizing transformer-based network via advanced decoder design for medical image segmentation.

作者信息

Yang Weibin, Dong Zhiqi, Xu Mingyuan, Xu Longwei, Geng Dehua, Li Yusong, Wang Pengwei

机构信息

School of Information Science and Engineering, Shandong University, Tsingtao, 266237, People's Republic of China.

出版信息

Biomed Phys Eng Express. 2025 Feb 5;11(2). doi: 10.1088/2057-1976/adaec7.

DOI:10.1088/2057-1976/adaec7

PMID:39869936

Abstract

U-Net is widely used in medical image segmentation due to its simple and flexible architecture design. To address the challenges of scale and complexity in medical tasks, several variants of U-Net have been proposed. In particular, methods based on Vision Transformer (ViT), represented by Swin UNETR, have gained widespread attention in recent years. However, these improvements often focus on the encoder, overlooking the crucial role of the decoder in optimizing segmentation details. This design imbalance limits the potential for further enhancing segmentation performance. To address this issue, we analyze the roles of various decoder components, including upsampling method, skip connection, and feature extraction module, as well as the shortcomings of existing methods. Consequently, we propose Swin DER (i.e.,UNETRecodernhanced andefined), by specifically optimizing the design of these three components. Swin DER performs upsampling using learnable interpolation algorithm called offset coordinate neighborhood weighted up sampling (Onsampling) and replaces traditional skip connection with spatial-channel parallel attention gate (SCP AG). Additionally, Swin DER introduces deformable convolution along with attention mechanism in the feature extraction module of the decoder. Our model design achieves excellent results, surpassing other state-of-the-art methods on both the Synapse dataset and the MSD brain tumor segmentation task. Code is available at:.

摘要

U-Net因其简单灵活的架构设计而在医学图像分割中被广泛使用。为应对医学任务中尺度和复杂性方面的挑战，人们提出了U-Net的几种变体。特别是，以Swin UNETR为代表的基于视觉Transformer（ViT）的方法近年来受到了广泛关注。然而，这些改进往往集中在编码器上，而忽略了解码器在优化分割细节方面的关键作用。这种设计上的不平衡限制了进一步提高分割性能的潜力。为解决这个问题，我们分析了各种解码器组件的作用，包括上采样方法、跳跃连接和特征提取模块，以及现有方法的缺点。因此，我们提出了Swin DER（即UNETRecodernhanced andefined），通过专门优化这三个组件的设计。Swin DER使用一种名为偏移坐标邻域加权上采样（Onsampling）的可学习插值算法进行上采样，并用空间通道并行注意力门（SCP AG）取代传统的跳跃连接。此外，Swin DER在解码器的特征提取模块中引入了可变形卷积以及注意力机制。我们的模型设计取得了优异的成果，在Synapse数据集和MSD脑肿瘤分割任务上均超过了其他先进方法。代码可在：获取。

Suppr 超能文献

文献检索

文件翻译

深度研究

Suppr 超能文献

文献检索

文件翻译

深度研究

通过先进的解码器设计优化基于Transformer的网络用于医学图像分割。

Optimizing transformer-based network via advanced decoder design for medical image segmentation.

作者信息

机构信息

出版信息

相似文献

引用本文的文献

通过先进的解码器设计优化基于Transformer的网络用于医学图像分割。

Optimizing transformer-based network via advanced decoder design for medical image segmentation.

作者信息

机构信息

出版信息

相似文献

引用本文的文献