Zhang Zheyuan, Bagci Ulas
Northwestern University, Evanston, IL 60201, USA.
Mach Learn Med Imaging. 2022 Sep;13583:171-180. doi: 10.1007/978-3-031-21014-3_18. Epub 2022 Dec 16.
Transformer-based neural networks have surpassed promising performance on many biomedical image segmentation tasks due to a better global information modeling from the self-attention mechanism. However, most methods are still designed for 2D medical images while ignoring the essential 3D volume information. The main challenge for 3D Transformer-based segmentation methods is the quadratic complexity introduced by the self-attention mechanism [17]. In this paper, we are addressing these two research gaps, lack of 3D methods and computational complexity in Transformers, by proposing a novel Transformer architecture that has an encoder-decoder style architecture with linear complexity. Furthermore, we newly introduce a dynamic token concept to further reduce the token numbers for self-attention calculation. Taking advantage of the global information modeling, we provide uncertainty maps from different hierarchy stages. We evaluate this method on multiple challenging CT pancreas segmentation datasets. Our results show that our novel 3D Transformer-based segmentor could provide promising highly feasible segmentation performance and accurate uncertainty quantification using single annotation. Code is available https://github.com/freshman97/LinTransUNet.
基于Transformer的神经网络由于自注意力机制能够更好地进行全局信息建模,在许多生物医学图像分割任务中取得了优异的性能。然而,大多数方法仍针对二维医学图像设计,忽略了重要的三维体信息。基于三维Transformer的分割方法面临的主要挑战是自注意力机制引入的二次复杂度[17]。在本文中,我们通过提出一种具有线性复杂度的编码器-解码器风格架构的新型Transformer架构,来解决这两个研究空白,即缺乏三维方法和Transformer中的计算复杂度问题。此外,我们新引入了动态令牌概念,以进一步减少自注意力计算的令牌数量。利用全局信息建模,我们从不同层次阶段提供不确定性图。我们在多个具有挑战性的CT胰腺分割数据集上评估了该方法。我们的结果表明,我们基于新型三维Transformer的分割器使用单一标注能够提供有前景的高度可行的分割性能和准确的不确定性量化。代码可在https://github.com/freshman97/LinTransUNet获取。