用于目标检测的多尺度语义增强网络。

Multi-scale semantic enhancement network for object detection.

机构信息

School of Computer and Software, Nanyang Institute of Technology, 80 Changjiang Road, Nanyang, 473004, Henan, China.

Chongqing Engineering Research Center for Spatial Big Data Intelligent Technology, Chongqing University of Posts and Telecommunications, No. 2, Chongwen Road, Chongqing, 400065, Chongqing, China.

出版信息

Sci Rep. 2023 May 3;13(1):7178. doi: 10.1038/s41598-023-34277-7.

DOI:10.1038/s41598-023-34277-7

PMID:37137973

原文链接:https://pmc.ncbi.nlm.nih.gov/articles/PMC10156693/

Abstract

In the field of object detection, feature pyramid network (FPN) can effectively extract multi-scale information. However, the majority of FPN-based methods suffer from a semantic gap between features of various sizes before feature fusion, which can lead to feature maps with significant aliasing. In this paper, we present a novel multi-scale semantic enhancement feature pyramid network (MSE-FPN) which consists of three effective modules: semantic enhancement module, semantic injection module, and gated channel guidance module to alleviate these problems. Specifically, inspired by the strong ability of the self-attention mechanism to model context, we propose a semantic enhancement module to model global context to obtain the global semantic information before feature fusion. Then we propose the semantic injection module to divide and merge global semantic information into feature maps at various scales to narrow the semantic gap between features at different scales and efficiently utilize the semantic information of high-level features. Finally, to mitigate feature aliasing caused by feature fusion, the gated channel guidance module selectively outputs crucial features via a gating unit. By replacing FPN with MSE-FPN in Faster R-CNN, our models achieve 39.4 and 41.2 Average precision (AP) using ResNet50 and ResNet101 as the backbone network respectively. When using ResNet-101-64x4d as the backbone, MSE-FPN achieved up to 43.4 AP. Our results demonstrate that replacing FPN with MSE-FPN significantly enhances the detection performance of state-of-the-art FPN-based detectors.

摘要

在目标检测领域，特征金字塔网络（FPN）可以有效地提取多尺度信息。然而，大多数基于 FPN 的方法在特征融合之前存在特征大小之间的语义差距，这可能导致特征图具有显著的混叠。在本文中，我们提出了一种新颖的多尺度语义增强特征金字塔网络（MSE-FPN），它由三个有效的模块组成：语义增强模块、语义注入模块和门控通道引导模块，以缓解这些问题。具体来说，受自注意力机制对建模上下文的强大能力的启发，我们提出了一种语义增强模块来对全局上下文进行建模，以在特征融合之前获得全局语义信息。然后，我们提出了语义注入模块，将全局语义信息分割并合并到各个尺度的特征图中，以缩小不同尺度特征之间的语义差距，并有效地利用高级特征的语义信息。最后，为了缓解特征融合引起的特征混叠，门控通道引导模块通过门控单元选择性地输出关键特征。通过在 Faster R-CNN 中用 MSE-FPN 替换 FPN，我们的模型在使用 ResNet50 和 ResNet101 作为骨干网络时分别实现了 39.4 和 41.2 的平均精度（AP）。当使用 ResNet-101-64x4d 作为骨干网络时，MSE-FPN 实现了高达 43.4 的 AP。我们的结果表明，用 MSE-FPN 替换 FPN 可以显著提高基于 FPN 的最先进检测器的检测性能。