Jia Yanru, Zeng Yu, Guo Huaping
School of Big Data and Artificial Intelligence, Xinyang University, Xinyang, China.
School of Computer and Information Techonology, Xinyang Normal University, Xinyang, China.
IET Syst Biol. 2025 Jan-Dec;19(1):e70036. doi: 10.1049/syb2.70036.
Accurate polyp segmentation is crucial for computer-aided diagnosis and early detection of colorectal cancer. Whereas feature pyramid network (FPN) and its variants are widely used in polyp segmentation, inherent limitations existing in FPN include: (1) repeated upsampling degrades fine details, reducing small polyp segmentation accuracy and (2) naive feature fusion (e.g., summation) inadequately captures global context, limiting performance on complex structures. To address limitations, we propose a cascaded aggregation network (CANet) that systematically integrates multi-level features for refined representation. CANet adopts PVT transformer as the backbone to extract robust multi-level representations and introduces a cascade aggregation module (CAM) that enriches semantic features without sacrificing spatial details. CAM adopts a top-down enhancement pathway, where high-level features progressively guide the fusion of multiscale information, enhancing semantic representation while preserving spatial details. CANet further integrates a multiscale context-aware module (MCAM) and a residual-based fusion module (RFM). MCAM applies parallel convolutions with diverse kernel sizes and dilation rates to low-level features, enabling fine-grained multiscale extraction of local details and enhancing scene understanding. RFM fuses these local features with high-level semantics from CAM, enabling effective cross-level integration. Experiments show that CANet outperforms SOTA methods in in- and out-of-distribution tests.
准确的息肉分割对于结直肠癌的计算机辅助诊断和早期检测至关重要。虽然特征金字塔网络(FPN)及其变体在息肉分割中被广泛使用,但FPN存在的固有局限性包括:(1)重复上采样会降低精细细节,降低小息肉分割的准确性;(2)简单的特征融合(例如求和)不能充分捕捉全局上下文,限制了在复杂结构上的性能。为了解决这些局限性,我们提出了一种级联聚合网络(CANet),该网络系统地集成多级特征以进行精细表示。CANet采用PVT变换器作为主干来提取强大的多级表示,并引入了一个级联聚合模块(CAM),该模块在不牺牲空间细节的情况下丰富语义特征。CAM采用自上而下的增强路径,其中高级特征逐步引导多尺度信息的融合,在保留空间细节的同时增强语义表示。CANet进一步集成了一个多尺度上下文感知模块(MCAM)和一个基于残差的融合模块(RFM)。MCAM对低级特征应用具有不同内核大小和扩张率的并行卷积,实现对局部细节的细粒度多尺度提取并增强场景理解。RFM将这些局部特征与来自CAM的高级语义进行融合,实现有效的跨级集成。实验表明,CANet在分布内和分布外测试中均优于当前最优方法。