Fang Kun, He Baochun, Liu Libo, Hu Haoyu, Fang Chihua, Huang Xuguang, Jia Fucang
School for Information and Optoelectronic Science and Engineering, South China Normal University, Guangzhou, China.
Research Center for Medical Artificial Intelligence, Shenzhen Institute of Advanced Technology, Chinese Academy of Sciences, Shenzhen, China.
Quant Imaging Med Surg. 2023 Mar 1;13(3):1619-1630. doi: 10.21037/qims-22-544. Epub 2023 Feb 10.
Methods based on the combination of transformer and convolutional neural networks (CNNs) have achieved impressive results in the field of medical image segmentation. However, most of the recently proposed combination segmentation approaches simply treat transformers as auxiliary modules which help to extract long-range information and encode global context into convolutional representations, and there is a lack of investigation on how to optimally combine self-attention with convolution.
We designed a novel transformer block (MRFormer) that combines a multi-head self-attention layer and a residual depthwise convolutional block as the basic unit to deeply integrate both long-range and local spatial information. The MRFormer block was embedded between the encoder and decoder in U-Net at the last two layers. This framework (UMRFormer-Net) was applied to the segmentation of three-dimensional (3D) pancreas, and its ability to effectively capture the characteristic contextual information of the pancreas and surrounding tissues was investigated.
Experimental results show that the proposed UMRFormer-Net achieved accuracy in pancreas segmentation that was comparable or superior to that of existing state-of-the-art 3D methods in both the Clinical Proteomic Tumor Analysis Consortium Pancreatic Ductal Adenocarcinoma (CPTAC-PDA) dataset and the public Medical Segmentation Decathlon dataset (self-division). UMRFormer-Net statistically significantly outperformed existing transformer-related methods and state-of-the-art 3D methods (P<0.05, P<0.01, or P<0.001), with a higher Dice coefficient (85.54% and 77.36%, respectively) or a lower 95% Hausdorff distance (4.05 and 8.34 mm, respectively).
UMRFormer-Net can obtain more matched and accurate segmentation boundary and region information in pancreas segmentation, thus improving the accuracy of pancreas segmentation. The code is available at https://github.com/supersunshinefk/UMRFormer-Net.
基于变压器和卷积神经网络(CNN)相结合的方法在医学图像分割领域取得了令人瞩目的成果。然而,最近提出的大多数组合分割方法只是将变压器视为辅助模块,帮助提取远程信息并将全局上下文编码为卷积表示,并且缺乏关于如何将自注意力与卷积进行最佳组合的研究。
我们设计了一种新颖的变压器模块(MRFormer),它将多头自注意力层和残差深度卷积块作为基本单元,以深度整合远程和局部空间信息。MRFormer模块被嵌入到U-Net编码器和解码器之间的最后两层。该框架(UMRFormer-Net)被应用于三维(3D)胰腺分割,并研究了其有效捕获胰腺及周围组织特征上下文信息的能力。
实验结果表明,所提出的UMRFormer-Net在胰腺分割中的准确率与临床蛋白质组肿瘤分析联盟胰腺导管腺癌(CPTAC-PDA)数据集和公共医学分割十项全能数据集(自行分割)中现有的最先进3D方法相当或更优。UMRFormer-Net在统计学上显著优于现有的与变压器相关的方法和最先进的3D方法(P<0.05、P<0.01或P<0.001),具有更高的Dice系数(分别为85.54%和77.36%)或更低的95%豪斯多夫距离(分别为4.05和8.34毫米)。
UMRFormer-Net在胰腺分割中能够获得更匹配、准确的分割边界和区域信息,从而提高胰腺分割的准确率。代码可在https://github.com/supersunshinefk/UMRFormer-Net获取。