Computer School, University of South China, Hengyang, 421001, China.
College of Mechanical and Vehicle Engineering, Hunan University, Hengyang, 410082, China.
Med Biol Eng Comput. 2023 Mar;61(3):661-671. doi: 10.1007/s11517-022-02723-9. Epub 2022 Dec 29.
Medical image segmentation is a critical step in many imaging applications. Automatic segmentation has gained extensive concern using a convolutional neural network (CNN). However, the traditional CNN-based methods fail to extract global and long-range contextual information due to local convolution operation. Transformer overcomes the limitation of CNN-based models. Inspired by the success of transformers in computer vision (CV), many researchers focus on designing the transformer-based U-shaped method in medical image segmentation. The transformer-based approach cannot effectively capture the fine-grained details. This paper proposes a dual encoder network with transformer-CNN for multi-organ segmentation. The new segmentation framework takes full advantage of CNN and transformer to enhance the segmentation accuracy. The Swin-transformer encoder extracts global information, and the CNN encoder captures local information. We introduce fusion modules to fuse convolutional features and the sequence of features from the transformer. Feature fusion is concatenated through the skip connection to smooth the decision boundary effectively. We extensively evaluate our method on the synapse multi-organ CT dataset and the automated cardiac diagnosis challenge (ACDC) dataset. The results demonstrate that the proposed method achieves Dice similarity coefficient (DSC) metrics of 80.68% and 91.12% on the synapse multi-organ CT and ACDC datasets, respectively. We perform the ablation studies on the ACDC dataset, demonstrating the effectiveness of critical components of our method. Our results match the ground-truth boundary more consistently than the existing models. Our approach gains more accurate results on challenging 2D images for multi-organ segmentation. Compared with the state-of-the-art methods, our proposed method achieves superior performance in multi-organ segmentation tasks. Graphical Abstract The key process in medical image segmentation.
医学图像分割是许多成像应用中的关键步骤。使用卷积神经网络(CNN),自动分割得到了广泛关注。然而,由于局部卷积运算,传统的基于 CNN 的方法无法提取全局和长程上下文信息。Transformer 克服了基于 CNN 的模型的局限性。受 Transformer 在计算机视觉(CV)中成功的启发,许多研究人员专注于在医学图像分割中设计基于 Transformer 的 U 型方法。基于 Transformer 的方法无法有效地捕捉细粒度细节。本文提出了一种具有 Transformer-CNN 的双编码器网络,用于多器官分割。新的分割框架充分利用 CNN 和 Transformer 来提高分割精度。Swin-Transformer 编码器提取全局信息,CNN 编码器捕获局部信息。我们引入融合模块来融合卷积特征和来自 Transformer 的特征序列。特征融合通过跳过连接进行拼接,以有效地平滑决策边界。我们在 synapse 多器官 CT 数据集和自动化心脏诊断挑战(ACDC)数据集上广泛评估了我们的方法。结果表明,我们的方法在 synapse 多器官 CT 和 ACDC 数据集上分别达到了 80.68%和 91.12%的 Dice 相似系数(DSC)度量。我们在 ACDC 数据集上进行了消融研究,证明了我们方法的关键组件的有效性。我们的结果比现有模型更一致地匹配真实边界。我们的方法在多器官分割的挑战性 2D 图像上获得了更准确的结果。与最先进的方法相比,我们提出的方法在多器官分割任务中表现出了优越的性能。