Suppr超能文献

VMDU-net:一种用于息肉分割的双编码器多尺度融合网络,集成了视觉曼巴和十字形变换器

VMDU-net: a dual encoder multi-scale fusion network for polyp segmentation with Vision Mamba and Cross-Shape Transformer integration.

作者信息

Li Peng, Ding Jianhua, Lim Chia S

机构信息

School of Computing & Technology, Asia Pacific University of Technology & Innovation, Lebuhraya Bukit Jalil, Taman Teknologi Malaysia, Bukit Jalil, Kuala Lumpur, Malaysia.

Gansu Provincial Tumor Hospital, Lanzhou, China.

出版信息

Front Artif Intell. 2025 Jun 18;8:1557508. doi: 10.3389/frai.2025.1557508. eCollection 2025.

Abstract

INTRODUCTION

Rectal cancer often originates from polyps. Early detection and timely removal of polyps are crucial for preventing colorectal cancer and inhibiting its progression to malignancy. While polyp segmentation algorithms are essential for aiding polyp removal, they face significant challenges due to the diverse shapes, unclear boundaries, and varying sizes of polyps. Additionally, capturing long-range dependencies remains difficult, with many existing algorithms struggling to converge effectively, limiting their practical application.

METHODS

To address these challenges, we propose a novel Dual Encoder Multi-Scale Feature Fusion Network, termed VMDU-Net. This architecture employs two parallel encoders: one incorporates Vision Mamba modules, and the other integrates a custom-designed Cross-Shape Transformer. To enhance semantic understanding of polyp morphology and boundaries, we design a Mamba-Transformer-Merge (MTM) module that performs attention-weighted fusion across spatial and channel dimensions. Furthermore, Depthwise Separable Convolutions are introduced to facilitate multi-scale feature extraction and improve convergence efficiency by leveraging the inductive bias of convolution.

RESULTS

Extensive experiments were conducted on five widely-used polyp segmentation datasets. The results show that VMDU-Net significantly outperforms existing state-of-the-art methods, especially in terms of segmentation accuracy and boundary detail preservation. Notably, the model achieved a Dice score of 0.934 on the Kvasir-SEG dataset and 0.951 on the CVC-ClinicDB dataset.

DISCUSSION

The proposed VMDU-Net effectively addresses key challenges in polyp segmentation by leveraging complementary strengths of Transformer-based and Mamba-based modules. Its strong performance across multiple datasets highlights its potential for practical clinical application in early colorectal cancer prevention.

CODE AVAILABILITY

The source code is publicly available at: https://github.com/sulayman-lee0212/VMDUNet/tree/4a8b95804178511fa5798af4a7d98fd6e6b1ebf7.

摘要

引言

直肠癌通常起源于息肉。早期检测并及时切除息肉对于预防结直肠癌及其向恶性肿瘤的进展至关重要。虽然息肉分割算法对于辅助息肉切除至关重要,但由于息肉形状多样、边界不清晰以及大小各异,它们面临着重大挑战。此外,捕捉长程依赖关系仍然很困难,许多现有算法难以有效收敛,限制了它们的实际应用。

方法

为应对这些挑战,我们提出了一种新颖的双编码器多尺度特征融合网络,称为VMDU-Net。该架构采用两个并行编码器:一个包含视觉曼巴模块,另一个集成了自定义设计的十字形变换器。为增强对息肉形态和边界的语义理解,我们设计了一个曼巴-变换器-融合(MTM)模块,该模块在空间和通道维度上执行注意力加权融合。此外,引入深度可分离卷积以促进多尺度特征提取,并通过利用卷积的归纳偏差提高收敛效率。

结果

在五个广泛使用的息肉分割数据集上进行了广泛的实验。结果表明,VMDU-Net显著优于现有的最先进方法,特别是在分割精度和边界细节保留方面。值得注意的是,该模型在Kvasir-SEG数据集上的Dice分数为0.934,在CVC-ClinicDB数据集上为0.951。

讨论

所提出的VMDU-Net通过利用基于变换器和基于曼巴的模块的互补优势,有效地解决了息肉分割中的关键挑战。其在多个数据集上的强大性能凸显了其在早期结直肠癌预防实际临床应用中的潜力。

代码可用性

源代码可在以下网址公开获取:https://github.com/sulayman-lee0212/VMDUNet/tree/4a8b95804178511fa5798af4a7d98fd6e6b1ebf7。

https://cdn.ncbi.nlm.nih.gov/pmc/blobs/af87/12213873/a614e76004b9/frai-08-1557508-g001.jpg

文献AI研究员

20分钟写一篇综述,助力文献阅读效率提升50倍。

立即体验

用中文搜PubMed

大模型驱动的PubMed中文搜索引擎

马上搜索

文档翻译

学术文献翻译模型,支持多种主流文档格式。

立即体验