PDNet：通过预测解耦实现更好的单阶段目标检测

PDNet: Towards Better One-stage Object Detection with Prediction Decoupling.

作者信息

Yang Li, Xu Yan, Wang Shaoru, Yuan Chunfeng, Zhang Ziqi, Li Bing, Hu Weiming

出版信息

IEEE Trans Image Process. 2022 Jul 28;PP. doi: 10.1109/TIP.2022.3193223.

DOI:10.1109/TIP.2022.3193223

Abstract

Recent one-stage object detectors follow a per-pixel prediction approach that predicts both the object category scores and boundary positions from every single grid location. However, the most suitable positions for inferring different targets, i.e., the object category and boundaries, are generally different. Predicting all these targets from the same grid location thus may lead to sub-optimal results. In this paper, we analyze the suitable inference positions for object category and boundaries, and propose a prediction-target-decoupled detector named PDNet to establish a more flexible detection paradigm. Our PDNet with the prediction decoupling mechanism encodes different targets separately in different locations. A learnable prediction collection module is devised with two sets of dynamic points, i.e., dynamic boundary points and semantic points, to collect and aggregate the predictions from the favorable regions for localization and classification. We adopt a two-step strategy to learn these dynamic point positions, where the prior positions are estimated for different targets first, and the network further predicts residual offsets to the positions with better perceptions of the object properties. Extensive experiments on the MS COCO benchmark demonstrate the effectiveness and efficiency of our method. With a single ResNeXt-64x4d-101-DCN as the backbone, our detector achieves 50.1 AP with single-scale testing, which outperforms the state-of-the-art methods by an appreciable margin under the same experimental settings. Moreover, our detector is highly efficient as a one-stage framework. Our code will be public.

摘要

近期的单阶段目标检测器采用逐像素预测方法，从每个网格位置预测目标类别分数和边界位置。然而，推断不同目标（即目标类别和边界）最合适的位置通常是不同的。因此，从同一网格位置预测所有这些目标可能会导致次优结果。在本文中，我们分析了目标类别和边界的合适推断位置，并提出了一种名为PDNet的预测目标解耦检测器，以建立更灵活的检测范式。我们具有预测解耦机制的PDNet在不同位置分别对不同目标进行编码。设计了一个可学习的预测收集模块，该模块有两组动态点，即动态边界点和语义点，用于从定位和分类的有利区域收集和聚合预测。我们采用两步策略来学习这些动态点的位置，首先为不同目标估计先验位置，然后网络进一步预测到对目标属性有更好感知的位置的残差偏移。在MS COCO基准上进行的大量实验证明了我们方法的有效性和效率。以单个ResNeXt-64x4d-101-DCN作为主干，我们的检测器在单尺度测试下达到了50.1 AP，在相同实验设置下比当前最先进的方法有显著优势。此外，作为一个单阶段框架，我们的检测器效率很高。我们的代码将公开。