Wu Zhihao, Wen Jie, Xu Yong, Yang Jian, Li Xuelong, Zhang David
IEEE Trans Neural Netw Learn Syst. 2022 Jun 8;PP. doi: 10.1109/TNNLS.2022.3178180.
Weakly supervised object detection (WSOD) has become an effective paradigm, which requires only class labels to train object detectors. However, WSOD detectors are prone to learn highly discriminative features corresponding to local objects rather than complete objects, resulting in imprecise object localization. To address the issue, designing backbones specifically for WSOD is a feasible solution. However, the redesigned backbone generally needs to be pretrained on large-scale ImageNet or trained from scratch, both of which require much more time and computational costs than fine-tuning. In this article, we explore to optimize the backbone without losing the availability of the original pretrained model. Since the pooling layer summarizes neighborhood features, it is crucial to spatial feature learning. In addition, it has no learnable parameters, so its modification will not change the pretrained model. Based on the above analysis, we further propose enhanced spatial feature learning (ESFL) for WSOD, which first takes full advantage of multiple kernels in a single pooling layer to handle multiscale objects and then enhances above-average activations within the rectangular neighborhood to alleviate the problem of ignoring unsalient object parts. The experimental results on the PASCAL VOC and the MS COCO benchmarks demonstrate that ESFL can bring significant performance improvement for the WSOD method and achieve state-of-the-art results.
弱监督目标检测(WSOD)已成为一种有效的范式,它仅需要类别标签来训练目标检测器。然而,WSOD检测器容易学习到与局部对象而非完整对象相对应的高度判别性特征,导致目标定位不准确。为了解决这个问题,专门为WSOD设计主干网络是一种可行的解决方案。然而,重新设计的主干网络通常需要在大规模的ImageNet上进行预训练或从头开始训练,这两者都比微调需要更多的时间和计算成本。在本文中,我们探索在不损失原始预训练模型可用性的情况下优化主干网络。由于池化层汇总邻域特征,因此对空间特征学习至关重要。此外,它没有可学习的参数,因此对其进行修改不会改变预训练模型。基于上述分析,我们进一步提出了用于WSOD的增强空间特征学习(ESFL),它首先在单个池化层中充分利用多个内核来处理多尺度对象,然后增强矩形邻域内高于平均水平的激活,以缓解忽略不显著对象部分的问题。在PASCAL VOC和MS COCO基准上的实验结果表明,ESFL可以为WSOD方法带来显著的性能提升,并取得了当前最优的结果。