IEEE Trans Med Imaging. 2024 Jun;43(6):2036-2049. doi: 10.1109/TMI.2023.3336237. Epub 2024 Jun 3.
Position emission tomography (PET) is widely used in clinics and research due to its quantitative merits and high sensitivity, but suffers from low signal-to-noise ratio (SNR). Recently convolutional neural networks (CNNs) have been widely used to improve PET image quality. Though successful and efficient in local feature extraction, CNN cannot capture long-range dependencies well due to its limited receptive field. Global multi-head self-attention (MSA) is a popular approach to capture long-range information. However, the calculation of global MSA for 3D images has high computational costs. In this work, we proposed an efficient spatial and channel-wise encoder-decoder transformer, Spach Transformer, that can leverage spatial and channel information based on local and global MSAs. Experiments based on datasets of different PET tracers, i.e., 18F-FDG, 18F-ACBC, 18F-DCFPyL, and 68Ga-DOTATATE, were conducted to evaluate the proposed framework. Quantitative results show that the proposed Spach Transformer framework outperforms state-of-the-art deep learning architectures.
正电子发射断层成像术(PET)由于其定量优势和高灵敏度而被广泛应用于临床和研究中,但存在信噪比(SNR)低的问题。最近,卷积神经网络(CNN)已被广泛用于提高 PET 图像质量。尽管在局部特征提取方面取得了成功和高效,但由于其有限的感受野,CNN 无法很好地捕捉远程依赖关系。全局多头自注意力(MSA)是一种捕获远程信息的流行方法。然而,对 3D 图像进行全局 MSA 的计算具有很高的计算成本。在这项工作中,我们提出了一种高效的空间和通道式编码器-解码器转换器 Spach Transformer,它可以基于局部和全局 MSA 利用空间和通道信息。基于不同 PET 示踪剂的数据集,即 18F-FDG、18F-ACBC、18F-DCFPyL 和 68Ga-DOTATATE,进行了实验,以评估所提出的框架。定量结果表明,所提出的 Spach Transformer 框架优于最先进的深度学习架构。