School of Computer Science and Technology, Huazhong University of Science and Technology, Wuhan, China.
Wuhan Zhongke Industrial Research Institute of Medical Science Co., Ltd, Wuhan, China.
Med Phys. 2024 Nov;51(11):8371-8389. doi: 10.1002/mp.17354. Epub 2024 Aug 13.
Precise glioma segmentation from multi-parametric magnetic resonance (MR) images is essential for brain glioma diagnosis. However, due to the indistinct boundaries between tumor sub-regions and the heterogeneous appearances of gliomas in volumetric MR scans, designing a reliable and automated glioma segmentation method is still challenging. Although existing 3D Transformer-based or convolution-based segmentation networks have obtained promising results via multi-modal feature fusion strategies or contextual learning methods, they widely lack the capability of hierarchical interactions between different modalities and cannot effectively learn comprehensive feature representations related to all glioma sub-regions.
To overcome these problems, in this paper, we propose a 3D hierarchical cross-modality interaction network (HCMINet) using Transformers and convolutions for accurate multi-modal glioma segmentation, which leverages an effective hierarchical cross-modality interaction strategy to sufficiently learn modality-specific and modality-shared knowledge correlated to glioma sub-region segmentation from multi-parametric MR images.
In the HCMINet, we first design a hierarchical cross-modality interaction Transformer (HCMITrans) encoder to hierarchically encode and fuse heterogeneous multi-modal features by Transformer-based intra-modal embeddings and inter-modal interactions in multiple encoding stages, which effectively captures complex cross-modality correlations while modeling global contexts. Then, we collaborate an HCMITrans encoder with a modality-shared convolutional encoder to construct the dual-encoder architecture in the encoding stage, which can learn the abundant contextual information from global and local perspectives. Finally, in the decoding stage, we present a progressive hybrid context fusion (PHCF) decoder to progressively fuse local and global features extracted by the dual-encoder architecture, which utilizes the local-global context fusion (LGCF) module to efficiently alleviate the contextual discrepancy among the decoding features.
Extensive experiments are conducted on two public and competitive glioma benchmark datasets, including the BraTS2020 dataset with 494 patients and the BraTS2021 dataset with 1251 patients. Results show that our proposed method outperforms existing Transformer-based and CNN-based methods using other multi-modal fusion strategies in our experiments. Specifically, the proposed HCMINet achieves state-of-the-art mean DSC values of 85.33% and 91.09% on the BraTS2020 online validation dataset and the BraTS2021 local testing dataset, respectively.
Our proposed method can accurately and automatically segment glioma regions from multi-parametric MR images, which is beneficial for the quantitative analysis of brain gliomas and helpful for reducing the annotation burden of neuroradiologists.
从多参数磁共振(MR)图像中精确分割脑胶质瘤对于脑胶质瘤的诊断至关重要。然而,由于肿瘤亚区之间的边界不明显,以及体积 MR 扫描中胶质瘤的异质性表现,设计一种可靠和自动化的胶质瘤分割方法仍然具有挑战性。尽管现有的基于 3D 转换器或卷积的分割网络通过多模态特征融合策略或上下文学习方法已经取得了有希望的结果,但它们普遍缺乏不同模态之间的层次交互能力,并且无法有效地学习与所有胶质瘤亚区相关的综合特征表示。
为了克服这些问题,本文提出了一种基于 3D 分层跨模态交互网络(HCMINet)的方法,该方法使用 Transformer 和卷积来进行准确的多模态胶质瘤分割,利用有效的分层跨模态交互策略,从多参数 MR 图像中充分学习与胶质瘤亚区分割相关的模态特定和模态共享知识。
在 HCMINet 中,我们首先设计了一个分层跨模态交互 Transformer(HCMITrans)编码器,通过多模态特征的基于 Transformer 的内模态嵌入和跨模态交互,在多个编码阶段对异质多模态特征进行分层编码和融合,有效地捕获了复杂的跨模态相关性,同时对全局上下文进行建模。然后,我们将 HCMITrans 编码器与模态共享卷积编码器相结合,在编码阶段构建双编码器架构,该架构可以从全局和局部角度学习丰富的上下文信息。最后,在解码阶段,我们提出了一种渐进式混合上下文融合(PHCF)解码器,用于渐进式融合双编码器架构提取的局部和全局特征,利用局部-全局上下文融合(LGCF)模块有效地缓解解码特征之间的上下文差异。
我们在两个公开的、具有竞争力的胶质瘤基准数据集上进行了广泛的实验,包括包含 494 名患者的 BraTS2020 数据集和包含 1251 名患者的 BraTS2021 数据集。实验结果表明,与我们实验中使用的其他多模态融合策略的现有基于 Transformer 和 CNN 的方法相比,我们提出的方法表现更好。具体来说,所提出的 HCMINet 在 BraTS2020 在线验证数据集和 BraTS2021 本地测试数据集上分别达到了 85.33%和 91.09%的最新平均 DSC 值。
我们提出的方法可以从多参数磁共振图像中准确、自动地分割胶质瘤区域,这有利于脑胶质瘤的定量分析,有助于减轻神经放射学家的注释负担。