Quan Yu, Zhang Dong, Zhang Liyan, Tang Jinhui
IEEE Trans Image Process. 2023;32:4341-4354. doi: 10.1109/TIP.2023.3297408. Epub 2023 Aug 2.
The visual feature pyramid has shown its superiority in both effectiveness and efficiency in a variety of applications. However, current methods overly focus on inter-layer feature interactions while disregarding the importance of intra-layer feature regulation. Despite some attempts to learn a compact intra-layer feature representation with the use of attention mechanisms or vision transformers, they overlook the crucial corner regions that are essential for dense prediction tasks. To address this problem, we propose a Centralized Feature Pyramid (CFP) network for object detection, which is based on a globally explicit centralized feature regulation. Specifically, we first propose a spatial explicit visual center scheme, where a lightweight MLP is used to capture the globally long-range dependencies, and a parallel learnable visual center mechanism is used to capture the local corner regions of the input images. Based on this, we then propose a globally centralized regulation for the commonly-used feature pyramid in a top-down fashion, where the explicit visual center information obtained from the deepest intra-layer feature is used to regulate frontal shallow features. Compared to the existing feature pyramids, CFP not only has the ability to capture the global long-range dependencies but also efficiently obtain an all-round yet discriminative feature representation. Experimental results on the challenging MS-COCO validate that our proposed CFP can achieve consistent performance gains on the state-of-the-art YOLOv5 and YOLOX object detection baselines.
视觉特征金字塔在各种应用中已展现出其在有效性和效率方面的优势。然而,当前方法过度关注层间特征交互,却忽视了层内特征调节的重要性。尽管有一些尝试通过注意力机制或视觉变换器来学习紧凑的层内特征表示,但它们忽略了对密集预测任务至关重要的关键角落区域。为了解决这个问题,我们提出了一种用于目标检测的集中式特征金字塔(CFP)网络,它基于全局显式的集中式特征调节。具体而言,我们首先提出一种空间显式视觉中心方案,其中使用轻量级多层感知器来捕获全局长程依赖,并使用并行可学习视觉中心机制来捕获输入图像的局部角落区域。基于此,我们随后以自上而下的方式对常用特征金字塔提出全局集中调节,其中从最深层内特征获得的显式视觉中心信息用于调节前面的浅层特征。与现有特征金字塔相比,CFP不仅具有捕获全局长程依赖的能力,还能有效地获得全面且有区分性的特征表示。在具有挑战性的MS-COCO上的实验结果验证了我们提出的CFP能够在最先进的YOLOv5和YOLOX目标检测基准上实现一致的性能提升。