Xu Hongyu, Lv Xutao, Wang Xiaoyu, Ren Zhou, Bodla Navaneeth, Chellappa Rama
IEEE Trans Pattern Anal Mach Intell. 2021 Jun;43(6):1914-1927. doi: 10.1109/TPAMI.2019.2957780. Epub 2021 May 11.
In this article, we propose a novel object detection algorithm named "Deep Regionlets" by integrating deep neural networks and a conventional detection schema for accurate generic object detection. Motivated by the effectiveness of regionlets for modeling object deformations and multiple aspect ratios, we incorporate regionlets into an end-to-end trainable deep learning framework. The deep regionlets framework consists of a region selection network and a deep regionlet learning module. Specifically, given a detection bounding box proposal, the region selection network provides guidance on where to select sub-regions from which features can be learned from. An object proposal typically contains three - 16 sub-regions. The regionlet learning module focuses on local feature selection and transformations to alleviate the effects of appearance variations. To this end, we first realize non-rectangular region selection within the detection framework to accommodate variations in object appearance. Moreover, we design a "gating network" within the regionlet leaning module to enable instance dependent soft feature selection and pooling. The Deep Regionlets framework is trained end-to-end without additional efforts. We present ablation studies and extensive experiments on the PASCAL VOC dataset and the Microsoft COCO dataset. The proposed method yields competitive performance over state-of-the-art algorithms, such as RetinaNet and Mask R-CNN, even without additional segmentation labels.
在本文中,我们通过整合深度神经网络和传统检测架构,提出了一种名为“深度区域子块”的新型目标检测算法,用于精确的通用目标检测。受区域子块在建模目标变形和多种宽高比方面有效性的启发,我们将区域子块纳入一个端到端可训练的深度学习框架。深度区域子块框架由一个区域选择网络和一个深度区域子块学习模块组成。具体而言,给定一个检测边界框提议,区域选择网络提供关于从何处选择可从中学习特征的子区域的指导。一个目标提议通常包含3 - 16个子区域。区域子块学习模块专注于局部特征选择和变换,以减轻外观变化的影响。为此,我们首先在检测框架内实现非矩形区域选择,以适应目标外观的变化。此外,我们在区域子块学习模块内设计了一个“门控网络”,以实现依赖于实例的软特征选择和池化。深度区域子块框架无需额外努力即可进行端到端训练。我们在PASCAL VOC数据集和微软COCO数据集上进行了消融研究和广泛实验。所提出的方法即使没有额外的分割标签,也能产生优于诸如RetinaNet和Mask R-CNN等当前最先进算法的性能。