Li Ya-Li, Wang Shengjin
IEEE Trans Image Process. 2022;31:2620-2632. doi: 10.1109/TIP.2022.3157453. Epub 2022 Mar 22.
In recent years, the community of object detection has witnessed remarkable progress with the development of deep neural networks. But the detection performance still suffers from the dilemma between complex networks and single-vector predictions. In this paper, we propose a novel approach to boost the object detection performance based on aggregating predictions. First, we propose a unified module with adjustable hyper-structure to generate multiple predictions from a single detection network. Second, we formulate the additive learning for aggregating predictions, which reduces the classification and regression losses by progressively adding the prediction values. Based on the gradient Boosting strategy, the optimization of the additional predictions is further modeled as weighted regression problems to fit the Newton-descent directions. By aggregating multiple predictions from a single network, we propose the BooDet approach which can Bootstrap the classification and bounding box regression for high-performance object Detection. In particular, we plug the BooDet into Cascade R-CNN for object detection. Extensive experiments show that the proposed approach is quite effective to improve object detection. We obtain a 1.3%~2.0% improvement over the strong baseline Cascade R-CNN on COCO val dataset. We achieve 56.5% AP on the COCO test-dev dataset with only bounding box annotations.
近年来,随着深度神经网络的发展,目标检测领域取得了显著进展。但检测性能仍受复杂网络与单向量预测之间困境的影响。本文提出一种基于聚合预测提升目标检测性能的新方法。首先,我们提出一种具有可调整超结构的统一模块,从单个检测网络生成多个预测。其次,我们制定用于聚合预测的加法学习,通过逐步添加预测值来减少分类和回归损失。基于梯度提升策略,将额外预测的优化进一步建模为加权回归问题以拟合牛顿下降方向。通过聚合单个网络的多个预测,我们提出了BooDet方法,它可以为高性能目标检测引导分类和边界框回归。特别是,我们将BooDet插入Cascade R-CNN进行目标检测。大量实验表明,所提方法对提高目标检测效果非常有效。在COCO验证数据集上,我们比强大的基线Cascade R-CNN提高了1.3%~2.0%。在仅带有边界框注释的COCO测试开发数据集上,我们实现了56.5%的平均精度。