IEEE J Biomed Health Inform. 2024 Nov;28(11):6803-6814. doi: 10.1109/JBHI.2024.3460745. Epub 2024 Nov 6.
The field of 3D medical image segmentation is witnessing a growing trend in the utilization of combined networks that integrate convolutional neural networks and transformers. Nevertheless, prevailing hybrid networks are confronted with limitations in their straightforward serial or parallel combination methods and lack an effective mechanism to fuse channel and spatial feature attention. To address these limitations, we present a robust multi-scale 3D medical image segmentation network, the Transformer-Driven Pyramid Attention Fusion Network, which is denoted as TPAFNet, leveraging a hybrid structure of CNN and transformer. Within this framework, we exploit the characteristics of atrous convolution to extract multi-scale information effectively, thereby enhancing the encoding results of the transformer. Furthermore, we introduce the TPAF block in the encoder to seamlessly fuse channel and spatial feature attention from multi-scale feature inputs. In contrast to conventional skip connections that simply concatenate or add features, our decoder is enriched with a TPAF connection, elevating the integration of feature attention between low-level and high-level features. Additionally, we propose a low-level encoding shortcut from the original input to the decoder output, preserving more original image features and contributing to enhanced results. Finally, the deep supervision is implemented using a novel CNN-based voxel-wise classifier to facilitate better network convergence. Experimental results demonstrate that TPAFNet significantly outperforms other state-of-the-art networks on two public datasets, indicating that our research can effectively improve the accuracy of medical image segmentation, thereby assisting doctors in making more precise diagnoses.
3D 医学图像分割领域见证了卷积神经网络和转换器相结合的网络的应用日益增长的趋势。然而,现有的混合网络在其简单的串行或并行组合方法方面存在局限性,并且缺乏有效融合通道和空间特征注意力的机制。为了解决这些局限性,我们提出了一种强大的多尺度 3D 医学图像分割网络,即 Transformer-Driven Pyramid Attention Fusion Network,简称 TPAFNet,它利用 CNN 和 transformer 的混合结构。在这个框架内,我们利用空洞卷积的特点有效地提取多尺度信息,从而增强了 transformer 的编码结果。此外,我们在编码器中引入了 TPAF 模块,以无缝融合来自多尺度特征输入的通道和空间特征注意力。与简单地串联或添加特征的传统跳过连接不同,我们的解码器通过 TPAF 连接丰富了特征注意力的集成,从而在低层次和高层次特征之间实现了更好的特征注意力集成。此外,我们从原始输入到解码器输出提出了一种低层次编码捷径,保留了更多原始图像特征,有助于提高结果。最后,通过一个新的基于 CNN 的体素分类器实现深度监督,以促进更好的网络收敛。实验结果表明,TPAFNet 在两个公共数据集上显著优于其他最先进的网络,表明我们的研究可以有效地提高医学图像分割的准确性,从而帮助医生做出更精确的诊断。