IEEE Trans Cybern. 2022 Apr;52(4):2300-2313. doi: 10.1109/TCYB.2020.3004636. Epub 2022 Apr 5.
State-of-the-art object detectors usually progressively downsample the input image until it is represented by small feature maps, which loses the spatial information and compromises the representation of small objects. In this article, we propose a context-aware block net (CAB Net) to improve small object detection by building high-resolution and strong semantic feature maps. To internally enhance the representation capacity of feature maps with high spatial resolution, we delicately design the context-aware block (CAB). CAB exploits pyramidal dilated convolutions to incorporate multilevel contextual information without losing the original resolution of feature maps. Then, we assemble CAB to the end of the truncated backbone network (e.g., VGG16) with a relatively small downsampling factor (e.g., 8) and cast off all following layers. CAB Net can capture both basic visual patterns as well as semantical information of small objects, thus improving the performance of small object detection. Experiments conducted on the benchmark Tsinghua-Tencent 100K and the Airport dataset show that CAB Net outperforms other top-performing detectors by a large margin while keeping real-time speed, which demonstrates the effectiveness of CAB Net for small object detection.
先进的目标检测器通常会逐步对输入图像进行下采样,直到它由小的特征图表示,这会丢失空间信息,并影响小物体的表示。在本文中,我们提出了一种上下文感知块网络(CAB Net),通过构建高分辨率和强语义特征图来提高小目标检测的性能。为了在具有高空间分辨率的特征图内部增强表示能力,我们精心设计了上下文感知块(CAB)。CAB 利用金字塔式扩张卷积来融合多层次的上下文信息,而不会丢失特征图的原始分辨率。然后,我们将 CAB 组装到截断的骨干网络(例如,VGG16)的末尾,使用相对较小的下采样因子(例如,8),并抛弃所有后续的层。CAB Net 可以捕捉到小物体的基本视觉模式和语义信息,从而提高小物体检测的性能。在基准 Tsinghua-Tencent 100K 和 Airport 数据集上进行的实验表明,CAB Net 在保持实时速度的同时,大大优于其他表现最好的检测器,这证明了 CAB Net 对小物体检测的有效性。