Liu Fang, Zhang YanDuo, Lu Tao, Wang Jiaming, Wang LiWei
Hubei Key Laboratory of Intelligent Robot, Wuhan Institute of Technology, Wuhan, 430205, China.
Wuhan Technology and Business University, Wuhan, 430065, China.
Sci Rep. 2025 Jul 2;15(1):23017. doi: 10.1038/s41598-025-07466-9.
Fusing multimodal data play a crucial role in accurate brain tumor segmentation network and clinical diagnosis, especially in scenarios with incomplete multimodal data. Existing multimodal fusion models usually perform intra-modal fusion at both shallow and deep layers relying predominantly on traditional attention fusion. Rather, using the same fusion strategy at different layers leads to critical issues, feature redundancy in shallow layers due to repetitive weighting of semantically similar low-level features, and progressive texture detail degradation in deeper layers caused by the inherent feature of deep neural networks. Additionally, the absence of intra-modal fusion results in the loss of unique critical information. To better enhance the representation of latent correlation features from every unique critical features, this paper proposes a Hierarchical In-Out Fusion method, the Out-Fusion block performs inter-modal fusion at both shallow and deep layers respectively, in the shallow layers, the SAOut-Fusion block with self-attention extracts texture information; the deepest layer of the network, the DDOut-Fusion block which integrates spatial and frequency domain features, compensates for the loss of texture detail by enhancing the detail of the high frequency component. which utilizes a gating mechanism to effectively combine the tumor's positional structural information and texture details. At the same time, the In-Fusion block is designed for intra-modal fusion, using multiple stacked Transformer-CNN blocks to hierarchical access modality-specific critical signatures. Experimental results on the BraTS2018 and the BraTS2020 datasets validate the superiority of this method, demonstrating improved network robustness and maintaining effectiveness even when certain modalities are missing. Our code is available https://github.com/liufangcoca-515/InOutFusion-main .
融合多模态数据在精确的脑肿瘤分割网络和临床诊断中起着至关重要的作用,特别是在多模态数据不完整的情况下。现有的多模态融合模型通常在浅层和深层都进行模态内融合,主要依赖于传统的注意力融合。然而,在不同层使用相同的融合策略会导致关键问题,浅层由于语义相似的低级特征的重复加权而出现特征冗余,而深层由于深度神经网络的固有特征导致纹理细节逐渐退化。此外,缺乏模态内融合会导致独特关键信息的丢失。为了更好地增强每个独特关键特征的潜在相关特征的表示,本文提出了一种分层进出融合方法,其中Out-Fusion模块分别在浅层和深层执行模态间融合,在浅层,具有自注意力的SAOut-Fusion模块提取纹理信息;在网络的最深层,集成空间和频域特征的DDOut-Fusion模块通过增强高频分量的细节来补偿纹理细节的丢失,它利用门控机制有效地结合肿瘤的位置结构信息和纹理细节。同时,In-Fusion模块专为模态内融合而设计,使用多个堆叠的Transformer-CNN模块分层访问特定模态的关键特征。在BraTS2018和BraTS2020数据集上的实验结果验证了该方法的优越性,表明即使某些模态缺失,网络的鲁棒性也得到了提高且保持了有效性。我们的代码可在https://github.com/liufangcoca-515/InOutFusion-main获取。