Liu Dongfang, Liang James, Geng Tony, Loui Alexander, Zhou Tianfei
IEEE Trans Image Process. 2023;32:2678-2692. doi: 10.1109/TIP.2023.3272826. Epub 2023 May 16.
Learning pyramidal feature representations is important for many dense prediction tasks (e.g., object detection, semantic segmentation) that demand multi-scale visual understanding. Feature Pyramid Network (FPN) is a well-known architecture for multi-scale feature learning, however, intrinsic weaknesses in feature extraction and fusion impede the production of informative features. This work addresses the weaknesses of FPN through a novel tripartite feature enhanced pyramid network (TFPN), with three distinct and effective designs. First, we develop a feature reference module with lateral connections to adaptively extract bottom-up features with richer details for feature pyramid construction. Second, we design a feature calibration module between adjacent layers that calibrates the upsampled features to be spatially aligned, allowing for feature fusion with accurate correspondences. Third, we introduce a feature feedback module in FPN, which creates a communication channel from the feature pyramid back to the bottom-up backbone and doubles the encoding capacity, enabling the entire architecture to generate incrementally more powerful representations. The TFPN is extensively evaluated over four popular dense prediction tasks, i.e., object detection, instance segmentation, panoptic segmentation, and semantic segmentation. The results demonstrate that TFPN consistently and significantly outperforms the vanilla FPN. Our code is available at https://github.com/jamesliang819.
学习金字塔特征表示对于许多需要多尺度视觉理解的密集预测任务(如目标检测、语义分割)非常重要。特征金字塔网络(FPN)是一种用于多尺度特征学习的著名架构,然而,特征提取和融合中的固有弱点阻碍了信息丰富特征的生成。这项工作通过一种新颖的三方特征增强金字塔网络(TFPN)解决了FPN的弱点,该网络有三种不同且有效的设计。首先,我们开发了一个具有横向连接的特征参考模块,以自适应地提取具有更丰富细节的自底向上特征用于特征金字塔构建。其次,我们在相邻层之间设计了一个特征校准模块,用于校准上采样特征以使其在空间上对齐,从而实现具有精确对应关系的特征融合。第三,我们在FPN中引入了一个特征反馈模块,该模块创建了一个从特征金字塔回到自底向上主干的通信通道,并使编码能力翻倍,使整个架构能够逐步生成更强大的表示。TFPN在四个流行的密集预测任务上进行了广泛评估,即目标检测、实例分割、全景分割和语义分割。结果表明,TFPN始终显著优于普通FPN。我们的代码可在https://github.com/jamesliang819获取。