Suppr超能文献

TransUNet:通过Transformer 的视角重新思考医学图像分割中的 U-Net 架构设计。

TransUNet: Rethinking the U-Net architecture design for medical image segmentation through the lens of transformers.

机构信息

Department of Computer Science, Johns Hopkins University, Baltimore, MD 21218, USA.

Department of Computer Science and Engineering, University of California, Santa Cruz, CA 95064, USA.

出版信息

Med Image Anal. 2024 Oct;97:103280. doi: 10.1016/j.media.2024.103280. Epub 2024 Jul 22.

Abstract

Medical image segmentation is crucial for healthcare, yet convolution-based methods like U-Net face limitations in modeling long-range dependencies. To address this, Transformers designed for sequence-to-sequence predictions have been integrated into medical image segmentation. However, a comprehensive understanding of Transformers' self-attention in U-Net components is lacking. TransUNet, first introduced in 2021, is widely recognized as one of the first models to integrate Transformer into medical image analysis. In this study, we present the versatile framework of TransUNet that encapsulates Transformers' self-attention into two key modules: (1) a Transformer encoder tokenizing image patches from a convolution neural network (CNN) feature map, facilitating global context extraction, and (2) a Transformer decoder refining candidate regions through cross-attention between proposals and U-Net features. These modules can be flexibly inserted into the U-Net backbone, resulting in three configurations: Encoder-only, Decoder-only, and Encoder+Decoder. TransUNet provides a library encompassing both 2D and 3D implementations, enabling users to easily tailor the chosen architecture. Our findings highlight the encoder's efficacy in modeling interactions among multiple abdominal organs and the decoder's strength in handling small targets like tumors. It excels in diverse medical applications, such as multi-organ segmentation, pancreatic tumor segmentation, and hepatic vessel segmentation. Notably, our TransUNet achieves a significant average Dice improvement of 1.06% and 4.30% for multi-organ segmentation and pancreatic tumor segmentation, respectively, when compared to the highly competitive nn-UNet, and surpasses the top-1 solution in the BrasTS2021 challenge. 2D/3D Code and models are available at https://github.com/Beckschen/TransUNet and https://github.com/Beckschen/TransUNet-3D, respectively.

摘要

医学图像分割对医疗保健至关重要,但基于卷积的方法(如 U-Net)在建模长程依赖关系方面存在局限性。为了解决这个问题,已经将专门用于序列到序列预测的 Transformer 集成到医学图像分割中。然而,对于 Transformer 在 U-Net 组件中的自注意力机制,我们的理解还不够全面。TransUNet 于 2021 年首次提出,被广泛认为是第一个将 Transformer 集成到医学图像分析中的模型之一。在这项研究中,我们提出了 TransUNet 的通用框架,该框架将 Transformer 的自注意力封装到两个关键模块中:(1)Transformer 编码器,对卷积神经网络(CNN)特征图中的图像补丁进行标记,从而实现全局上下文提取;(2)Transformer 解码器,通过提案和 U-Net 特征之间的交叉注意力来细化候选区域。这些模块可以灵活地插入到 U-Net 骨干网络中,从而产生三种配置:仅编码器、仅解码器和编码器+解码器。TransUNet 提供了一个包含 2D 和 3D 实现的库,使用户可以轻松地定制所选的架构。我们的研究结果强调了编码器在建模多个腹部器官之间相互作用方面的有效性,以及解码器在处理肿瘤等小目标方面的优势。它在多种医学应用中表现出色,例如多器官分割、胰腺肿瘤分割和肝血管分割。值得注意的是,与极具竞争力的 nn-UNet 相比,我们的 TransUNet 在多器官分割和胰腺肿瘤分割方面的平均 Dice 分别提高了 1.06%和 4.30%,并且在 BrasTS2021 挑战赛中超过了排名第一的解决方案。2D/3D 代码和模型可分别在 https://github.com/Beckschen/TransUNethttps://github.com/Beckschen/TransUNet-3D 上获得。

文献AI研究员

20分钟写一篇综述,助力文献阅读效率提升50倍。

立即体验

用中文搜PubMed

大模型驱动的PubMed中文搜索引擎

马上搜索

文档翻译

学术文献翻译模型,支持多种主流文档格式。

立即体验