Department of Biomedical Engineering, School of Information Science and Technology, Fudan University, Shanghai 200438, PR China.
Department of Nuclear Medicine, Fudan University Shanghai Cancer Center, Shanghai 201321, PR China.
Comput Methods Programs Biomed. 2024 Jun;251:108216. doi: 10.1016/j.cmpb.2024.108216. Epub 2024 May 11.
Accurate segmentation of esophageal gross tumor volume (GTV) indirectly enhances the efficacy of radiotherapy for patients with esophagus cancer. In this domain, learning-based methods have been employed to fuse cross-modality positron emission tomography (PET) and computed tomography (CT) images, aiming to improve segmentation accuracy. This fusion is essential as it combines functional metabolic information from PET with anatomical information from CT, providing complementary information. While the existing three-dimensional (3D) segmentation method has achieved state-of-the-art (SOTA) performance, it typically relies on pure-convolution architectures, limiting its ability to capture long-range spatial dependencies due to convolution's confinement to a local receptive field. To address this limitation and further enhance esophageal GTV segmentation performance, this work proposes a transformer-guided cross-modality adaptive feature fusion network, referred to as TransAttPSNN, which is based on cross-modality PET/CT scans.
Specifically, we establish an attention progressive semantically-nested network (AttPSNN) by incorporating the convolutional attention mechanism into the progressive semantically-nested network (PSNN). Subsequently, we devise a plug-and-play transformer-guided cross-modality adaptive feature fusion model, which is inserted between the multi-scale feature counterparts of a two-stream AttPSNN backbone (one for the PET modality flow and another for the CT modality flow), resulting in the proposed TransAttPSNN architecture.
Through extensive four-fold cross-validation experiments on the clinical PET/CT cohort. The proposed approach acquires a Dice similarity coefficient (DSC) of 0.76 ± 0.13, a Hausdorff distance (HD) of 9.38 ± 8.76 mm, and a Mean surface distance (MSD) of 1.13 ± 0.94 mm, outperforming the SOTA competing methods. The qualitative results show a satisfying consistency with the lesion areas.
The devised transformer-guided cross-modality adaptive feature fusion module integrates the strengths of PET and CT, effectively enhancing the segmentation performance of esophageal GTV. The proposed TransAttPSNN has further advanced the research of esophageal GTV segmentation.
准确分割食管癌大体肿瘤体积(GTV)可间接提高食管癌患者放射治疗的疗效。在这一领域,基于学习的方法已被用于融合跨模态正电子发射断层扫描(PET)和计算机断层扫描(CT)图像,旨在提高分割准确性。这种融合是必要的,因为它将来自 PET 的功能代谢信息与来自 CT 的解剖信息相结合,提供互补信息。虽然现有的三维(3D)分割方法已经达到了最先进的水平(SOTA),但它通常依赖于纯卷积架构,由于卷积受到局部感受野的限制,其捕捉远程空间依赖的能力有限。为了克服这一限制,进一步提高食管 GTV 分割性能,本研究提出了一种基于跨模态 PET/CT 扫描的基于变压器引导的跨模态自适应特征融合网络,称为 TransAttPSNN。
具体来说,我们通过将卷积注意力机制纳入渐进语义嵌套网络(PSNN),建立了一个注意力渐进语义嵌套网络(AttPSNN)。然后,我们设计了一个可插拔的变压器引导的跨模态自适应特征融合模型,插入到双流 AttPSNN 骨干网(一个用于 PET 模态流,另一个用于 CT 模态流)的多尺度特征对应物之间,得到了所提出的 TransAttPSNN 架构。
通过对临床 PET/CT 队列进行四折交叉验证实验,该方法获得了 0.76±0.13 的 Dice 相似系数(DSC)、9.38±8.76mm 的 Hausdorff 距离(HD)和 1.13±0.94mm 的平均表面距离(MSD),优于最先进的竞争方法。定性结果显示,病变区域的一致性令人满意。
所设计的变压器引导的跨模态自适应特征融合模块集成了 PET 和 CT 的优势,有效提高了食管 GTV 的分割性能。所提出的 TransAttPSNN 进一步推进了食管 GTV 分割的研究。