School of Electrical and Electronics Engineering, Chung-Ang University, Seoul 06974, Korea.
Sensors (Basel). 2021 Apr 17;21(8):2842. doi: 10.3390/s21082842.
The development of deep learning has achieved great success in object detection, but small object detection is still a difficult and challenging task in computer vision. To address the problem, we propose an improved single-shot multibox detector (SSD) using enhanced feature map blocks (SSD-EMB). The enhanced feature map block (EMB) consists of attention stream and feature map concatenation stream. The attention stream allows the proposed model to focus on the object regions rather than background owing to channel averaging and the effectiveness of the normalization. The feature map concatenation stream provides additional semantic information to the model without degrading the detection speed. By combining the output of these two streams, the enhanced feature map, which improves the detection of a small object, is generated. Experimental results show that the proposed model has high accuracy in small object detection. The proposed model not only achieves good detection accuracy, but also has a good detection speed. The SSD-EMB achieved a mean average precision (mAP) of 80.4% on the PASCAL VOC 2007 dataset at 30 frames per second on an RTX 2080Ti graphics processing unit, an mAP of 79.9% on the VOC 2012 dataset, and an mAP of 26.6% on the MS COCO dataset.
深度学习在目标检测方面取得了巨大的成功,但小目标检测仍然是计算机视觉中的一个困难和具有挑战性的任务。为了解决这个问题,我们提出了一种使用增强特征图块(SSD-EMB)的改进型单次多框检测器(SSD)。增强特征图块(EMB)由注意力流和特征图拼接流组成。注意力流通过通道平均和归一化的有效性,使所提出的模型能够关注物体区域而不是背景。特征图拼接流为模型提供了额外的语义信息,而不会降低检测速度。通过组合这两个流的输出,生成了增强的特征图,从而提高了小物体的检测效果。实验结果表明,所提出的模型在小物体检测方面具有很高的准确性。所提出的模型不仅具有良好的检测精度,而且具有良好的检测速度。在 RTX 2080Ti 图形处理单元上,SSD-EMB 在 30 帧/秒的情况下,在 PASCAL VOC 2007 数据集上的平均精度(mAP)为 80.4%,在 VOC 2012 数据集上的 mAP 为 79.9%,在 MS COCO 数据集上的 mAP 为 26.6%。