Suppr超能文献

SwinCross:用于 PET/CT 图像中头颈部肿瘤分割的跨模态 Swin 变换器。

SwinCross: Cross-modal Swin transformer for head-and-neck tumor segmentation in PET/CT images.

机构信息

Center for Advanced Medical Computing and Analysis, Massachusetts General Hospital/Harvard Medical School, Boston, Massachusetts, USA.

The Russell H Morgan Department of Radiology and Radiological Science, School of Medicine, Johns Hopkins University, Baltimore, Maryland, USA.

出版信息

Med Phys. 2024 Mar;51(3):2096-2107. doi: 10.1002/mp.16703. Epub 2023 Sep 30.

Abstract

BACKGROUND

Radiotherapy (RT) combined with cetuximab is the standard treatment for patients with inoperable head and neck cancers. Segmentation of head and neck (H&N) tumors is a prerequisite for radiotherapy planning but a time-consuming process. In recent years, deep convolutional neural networks (DCNN) have become the de facto standard for automated image segmentation. However, due to the expensive computational cost associated with enlarging the field of view in DCNNs, their ability to model long-range dependency is still limited, and this can result in sub-optimal segmentation performance for objects with background context spanning over long distances. On the other hand, Transformer models have demonstrated excellent capabilities in capturing such long-range information in several semantic segmentation tasks performed on medical images.

PURPOSE

Despite the impressive representation capacity of vision transformer models, current vision transformer-based segmentation models still suffer from inconsistent and incorrect dense predictions when fed with multi-modal input data. We suspect that the power of their self-attention mechanism may be limited in extracting the complementary information that exists in multi-modal data. To this end, we propose a novel segmentation model, debuted, Cross-modal Swin Transformer (SwinCross), with cross-modal attention (CMA) module to incorporate cross-modal feature extraction at multiple resolutions.

METHODS

We propose a novel architecture for cross-modal 3D semantic segmentation with two main components: (1) a cross-modal 3D Swin Transformer for integrating information from multiple modalities (PET and CT), and (2) a cross-modal shifted window attention block for learning complementary information from the modalities. To evaluate the efficacy of our approach, we conducted experiments and ablation studies on the HECKTOR 2021 challenge dataset. We compared our method against nnU-Net (the backbone of the top-5 methods in HECKTOR 2021) and other state-of-the-art transformer-based models, including UNETR and Swin UNETR. The experiments employed a five-fold cross-validation setup using PET and CT images.

RESULTS

Empirical evidence demonstrates that our proposed method consistently outperforms the comparative techniques. This success can be attributed to the CMA module's capacity to enhance inter-modality feature representations between PET and CT during head-and-neck tumor segmentation. Notably, SwinCross consistently surpasses Swin UNETR across all five folds, showcasing its proficiency in learning multi-modal feature representations at varying resolutions through the cross-modal attention modules.

CONCLUSIONS

We introduced a cross-modal Swin Transformer for automating the delineation of head and neck tumors in PET and CT images. Our model incorporates a cross-modality attention module, enabling the exchange of features between modalities at multiple resolutions. The experimental results establish the superiority of our method in capturing improved inter-modality correlations between PET and CT for head-and-neck tumor segmentation. Furthermore, the proposed methodology holds applicability to other semantic segmentation tasks involving different imaging modalities like SPECT/CT or PET/MRI. Code:https://github.com/yli192/SwinCross_CrossModalSwinTransformer_for_Medical_Image_Segmentation.

摘要

背景

放疗(RT)联合西妥昔单抗是不可手术头颈部癌症患者的标准治疗方法。头颈部(H&N)肿瘤的分割是放疗计划的前提,但这是一个耗时的过程。近年来,深度卷积神经网络(DCNN)已成为自动图像分割的事实上的标准。然而,由于 DCNN 中扩大视野所涉及的昂贵计算成本,其建模长程依赖关系的能力仍然有限,这可能导致具有长距离背景上下文的对象的分割性能不佳。另一方面,Transformer 模型在对医学图像执行的几个语义分割任务中展示了捕获此类长程信息的出色能力。

目的

尽管视觉Transformer 模型具有令人印象深刻的表示能力,但当前基于视觉Transformer 的分割模型在使用多模态输入数据时仍然存在不一致和不正确的密集预测。我们怀疑它们的自注意力机制的能力可能有限,无法提取多模态数据中存在的互补信息。为此,我们提出了一种新颖的分割模型,名为 Cross-modal Swin Transformer(SwinCross),具有跨模态注意力(CMA)模块,可在多个分辨率下进行跨模态特征提取。

方法

我们提出了一种用于跨模态 3D 语义分割的新架构,主要包括两个组件:(1)用于整合来自多种模态(PET 和 CT)信息的跨模态 3D Swin Transformer,以及(2)用于从模态中学习互补信息的跨模态移位窗口注意力块。为了评估我们方法的效果,我们在 HECKTOR 2021 挑战赛数据集上进行了实验和消融研究。我们将我们的方法与 nnU-Net(HECKTOR 2021 中前 5 名方法的骨干)和其他基于 Transformer 的最先进模型(包括 UNETR 和 Swin UNETR)进行了比较。实验使用 PET 和 CT 图像进行了五折交叉验证设置。

结果

经验证据表明,我们提出的方法始终优于比较技术。这一成功可归因于 CMA 模块在头颈部肿瘤分割期间增强 PET 和 CT 之间的跨模态特征表示的能力。值得注意的是,SwinCross 在所有五折中均始终优于 Swin UNETR,这表明它能够通过跨模态注意力模块在不同分辨率下学习多模态特征表示。

结论

我们引入了一种跨模态 Swin Transformer,用于自动化 PET 和 CT 图像中头颈部肿瘤的描绘。我们的模型包含一个跨模态注意力模块,可在多个分辨率下实现模态之间的特征交换。实验结果证明了我们的方法在捕获 PET 和 CT 之间改进的跨模态相关性方面的优越性,用于头颈部肿瘤分割。此外,所提出的方法适用于涉及不同成像模态(如 SPECT/CT 或 PET/MRI)的其他语义分割任务。代码:https://github.com/yli192/SwinCross_CrossModalSwinTransformer_for_Medical_Image_Segmentation。

相似文献

引用本文的文献

本文引用的文献

3
Self-supervised 3D anatomy segmentation using self-distilled masked image transformer (SMIT).使用自蒸馏掩码图像变换器(SMIT)的自监督3D解剖分割。
Med Image Comput Comput Assist Interv. 2022 Sep;13434:556-566. doi: 10.1007/978-3-031-16440-8_53. Epub 2022 Sep 16.
4
TransMorph: Transformer for unsupervised medical image registration.TransMorph:用于无监督医学图像配准的转换器。
Med Image Anal. 2022 Nov;82:102615. doi: 10.1016/j.media.2022.102615. Epub 2022 Sep 14.
5
VOLO: Vision Outlooker for Visual Recognition.VOLO:用于视觉识别的视觉展望器
IEEE Trans Pattern Anal Mach Intell. 2023 May;45(5):6575-6586. doi: 10.1109/TPAMI.2022.3206108. Epub 2023 Apr 3.
6
The Medical Segmentation Decathlon.医学分割十项全能
Nat Commun. 2022 Jul 15;13(1):4128. doi: 10.1038/s41467-022-30695-9.
8
Deep Learning-based Image Segmentation on Multimodal Medical Imaging.基于深度学习的多模态医学影像图像分割
IEEE Trans Radiat Plasma Med Sci. 2019 Mar;3(2):162-169. doi: 10.1109/trpms.2018.2890359. Epub 2019 Jan 1.

文献AI研究员

20分钟写一篇综述,助力文献阅读效率提升50倍。

立即体验

用中文搜PubMed

大模型驱动的PubMed中文搜索引擎

马上搜索

文档翻译

学术文献翻译模型,支持多种主流文档格式。

立即体验