Zhou Wenyi, Shi Ziyang, Xie Bin, Li Fang, Yin Jiehao, Zhang Yongzhong, Hu Linan, Li Lin, Yan Yongming, Wei Xiajun, Hu Zhen, Luo Zhengmao, Peng Wanxiang, Xie Xiaochun, Long Xiaoli
School of Electronic Information and Physics, Central South University of Forestry and Technology, Changsha, China.
Department of Radiology, Zhuzhou Hospital Affiliated to Xiangya' School of Medicine, Central South University, Zhuzhou, China.
Front Oncol. 2025 Jul 18;15:1622426. doi: 10.3389/fonc.2025.1622426. eCollection 2025.
Accurate and automated segmentation of pancreatic tumors from CT images via deep learning is essential for the clinical diagnosis of pancreatic cancer. However, two key challenges persist: (a) complex phenotypic variations in pancreatic morphology cause segmentation models to focus predominantly on healthy tissue over tumors, compromising tumor feature extraction and segmentation accuracy; (b) existing methods often struggle to retain fine-grained local features, leading to performance degradation in pancreas-tumor segmentation.
To overcome these limitations, we propose SMF-Net (Semantic-Guided Multimodal Fusion Network), a novel multimodal medical image segmentation framework integrating a CNN-Transformer hybrid encoder. The framework incorporates AMBERT, a progressive feature extraction module, and the Multimodal Token Transformer (MTT) to fuse visual and semantic features for enhanced tumor localization. Additionally, The Multimodal Enhanced Attention Module (MEAM) further improves the retention of local discriminative features. To address multimodal data scarcity, we adopt a semi-supervised learning paradigm based on a Dual-Adversarial-Student Network (DAS-Net). Furthermore, in collaboration with Zhuzhou Central Hospital, we constructed the Multimodal Pancreatic Tumor Dataset (MPTD).
The experimental results on the MPTD indicate that our model achieved Dice scores of 79.25% and 64.21% for pancreas and tumor segmentation, respectively, showing improvements of 2.24% and 4.18% over the original model. Furthermore, the model outperformed existing state-of-the-art methods on the QaTa-COVID-19 and MosMedData lung infection segmentation datasets in terms of average Dice scores, demonstrating its strong generalization ability.
The experimental results demonstrate that SMF-Net delivers accurate segmentation of both pancreatic, tumor and pulmonary regions, highlighting its strong potential for real-world clinical applications.
通过深度学习从CT图像中准确、自动地分割胰腺肿瘤对于胰腺癌的临床诊断至关重要。然而,两个关键挑战依然存在:(a)胰腺形态的复杂表型变异导致分割模型主要关注健康组织而非肿瘤,从而损害肿瘤特征提取和分割准确性;(b)现有方法往往难以保留细粒度局部特征,导致胰腺肿瘤分割性能下降。
为克服这些局限性,我们提出了SMF-Net(语义引导多模态融合网络),这是一种集成了CNN-Transformer混合编码器的新型多模态医学图像分割框架。该框架包含渐进式特征提取模块AMBERT和多模态令牌变换器(MTT),以融合视觉和语义特征来增强肿瘤定位。此外,多模态增强注意力模块(MEAM)进一步提高了局部判别特征的保留率。为解决多模态数据稀缺问题,我们采用了基于双对抗学生网络(DAS-Net)的半监督学习范式。此外,我们与株洲市中心医院合作构建了多模态胰腺肿瘤数据集(MPTD)。
在MPTD上的实验结果表明,我们的模型在胰腺和肿瘤分割方面的Dice分数分别达到了79.25%和64.21%,比原始模型分别提高了2.24%和4.18%。此外,在QaTa-COVID-19和MosMedData肺部感染分割数据集上,该模型在平均Dice分数方面优于现有最先进的方法,证明了其强大的泛化能力。
实验结果表明,SMF-Net能够准确分割胰腺、肿瘤和肺部区域,凸显了其在实际临床应用中的强大潜力。