IEEE Trans Pattern Anal Mach Intell. 2022 Feb;44(2):985-1001. doi: 10.1109/TPAMI.2020.3012414. Epub 2022 Jan 7.
Detectors based on deep learning tend to detect multi-scale objects on a single input image for efficiency. Recent works, such as FPN and SSD, generally use feature maps from multiple layers with different spatial resolutions to detect objects at different scales, e.g., high-resolution feature maps for small objects. However, we find that objects at all scales can also be well detected with features from a single layer of the network. In this paper, we carefully examine the factors affecting detection performance across a large range of scales, and conclude that the balance of training samples, including both positive and negative ones, at different scales is the key. We propose a group sampling method which divides the anchors into several groups according to the scale, and ensure that the number of samples for each group is the same during training. Our approach using only one single layer of FPN as features is able to advance the state-of-the-arts. Comprehensive analysis and extensive experiments have been conducted to show the effectiveness of the proposed method. Moreover, we show that our approach is favorably applicable to other tasks, such as object detection on COCO dataset, and to other detection pipelines, such as YOLOv3, SSD and R-FCN. Our approach, evaluated on face detection benchmarks including FDDB and WIDER FACE datasets, achieves state-of-the-art results without bells and whistles.
基于深度学习的探测器往往会在单个输入图像上检测多尺度的目标,以提高效率。最近的工作,如 FPN 和 SSD,通常使用来自多个具有不同空间分辨率的层的特征图来检测不同尺度的目标,例如用于小目标的高分辨率特征图。然而,我们发现使用网络的单个层的特征也可以很好地检测到所有尺度的目标。在本文中,我们仔细研究了影响大范围内尺度检测性能的因素,并得出结论,不同尺度的训练样本(包括正样本和负样本)的平衡是关键。我们提出了一种分组采样方法,根据尺度将锚分为几组,并确保每组的样本数量在训练过程中相同。我们仅使用 FPN 的单层作为特征的方法能够提高现有技术的水平。我们进行了全面的分析和广泛的实验,以展示所提出方法的有效性。此外,我们还表明,我们的方法可以很好地应用于其他任务,例如 COCO 数据集上的目标检测,以及其他检测管道,如 YOLOv3、SSD 和 R-FCN。我们的方法在包括 FDDB 和 WIDER FACE 数据集在内的人脸检测基准上进行评估,无需 bells and whistles 即可实现最先进的结果。