Zheng Shenhai, Tan Jiaxin, Jiang Chuangbo, Li Laquan
College of Computer Science and Technology, Chongqing University of Posts and Telecommunications, Chongqing 400065, People's Republic of China.
Chongqing Key Laboratory of Image Cognition, Chongqing University of Posts and Telecommunications, Chongqing 400065, People's Republic of China.
Phys Med Biol. 2023 Jan 9;68(2). doi: 10.1088/1361-6560/aca74c.
Over the past years, convolutional neural networks based methods have dominated the field of medical image segmentation. But the main drawback of these methods is that they have difficulty representing long-range dependencies. Recently, the Transformer has demonstrated super performance in computer vision and has also been successfully applied to medical image segmentation because of the self-attention mechanism and long-range dependencies encoding on images. To the best of our knowledge, only a few works focus on cross-modalities of image segmentation using the Transformer. Hence, the main objective of this study was to design, propose and validate a deep learning method to extend the application of Transformer to multi-modality medical image segmentation.This paper proposes a novel automated multi-modal Transformer network termed AMTNet for 3D medical image segmentation. Especially, the network is a well-modeled U-shaped network architecture where many effective and significant changes have been made in the feature encoding, fusion, and decoding parts. The encoding part comprises 3D embedding, 3D multi-modal Transformer, and 3D Co-learn down-sampling blocks. Symmetrically, the 3D Transformer block, upsampling block, and 3D-expanding blocks are included in the decoding part. In addition, a Transformer-based adaptive channel interleaved Transformer feature fusion module is designed to fully fuse features of different modalities.We provide a comprehensive experimental analysis of the Prostate and BraTS2021 datasets. The results show that our method achieves an average DSC of 0.907 and 0.851 (0.734 for ET, 0.895 for TC, and 0.924 for WT) on these two datasets, respectively. These values show that AMTNet yielded significant improvements over the state-of-the-art segmentation networks.The proposed 3D segmentation network exploits complementary features of different modalities during the feature extraction process at multiple scales to increase the 3D feature representations and improve the segmentation efficiency. This powerful network enriches the research of the Transformer to multi-modal medical image segmentation.
在过去几年中,基于卷积神经网络的方法在医学图像分割领域占据主导地位。但这些方法的主要缺点是难以表示长程依赖关系。最近,Transformer在计算机视觉中展现出卓越性能,并且由于其自注意力机制以及对图像的长程依赖关系编码,也已成功应用于医学图像分割。据我们所知,仅有少数工作聚焦于使用Transformer进行图像分割的跨模态研究。因此,本研究的主要目标是设计、提出并验证一种深度学习方法,将Transformer的应用扩展到多模态医学图像分割。本文提出了一种用于3D医学图像分割的新型自动多模态Transformer网络,称为AMTNet。特别地,该网络是一种精心设计的U形网络架构,在特征编码、融合和解码部分进行了许多有效且显著的改进。编码部分包括3D嵌入、3D多模态Transformer和3D协同学习下采样块。对称地,解码部分包括3D Transformer块、上采样块和3D扩展块。此外,还设计了一个基于Transformer的自适应通道交错Transformer特征融合模块,以充分融合不同模态的特征。我们对前列腺和BraTS2021数据集进行了全面的实验分析。结果表明,我们的方法在这两个数据集上分别实现了0.907和0.851的平均DSC(ET为0.734,TC为0.895,WT为0.924)。这些数值表明,AMTNet相对于当前最先进的分割网络有显著改进。所提出的3D分割网络在多尺度特征提取过程中利用不同模态的互补特征,以增加3D特征表示并提高分割效率。这个强大的网络丰富了Transformer在多模态医学图像分割方面的研究。