Liu Tianrui, Stathaki Tania
Department of Electrical and Electronic Engineering, Imperial College London, London, United Kingdom.
Front Neurorobot. 2018 Oct 5;12:64. doi: 10.3389/fnbot.2018.00064. eCollection 2018.
Convolutional neural networks (CNN) have enabled significant improvements in pedestrian detection owing to the strong representation ability of the CNN features. However, it is generally difficult to reduce false positives on hard negative samples such as tree leaves, traffic lights, poles, etc. Some of these hard negatives can be removed by making use of high level semantic vision cues. In this paper, we propose a region-based CNN method which makes use of semantic cues for better pedestrian detection. Our method extends the Faster R-CNN detection framework by adding a branch of network for semantic image segmentation. The semantic network aims to compute complementary higher level semantic features to be integrated with the convolutional features. We make use of multi-resolution feature maps extracted from different network layers in order to ensure good detection accuracy for pedestrians at different scales. Boosted forest is used for training the integrated features in a cascaded manner for hard negatives mining. Experiments on the Caltech pedestrian dataset show improvements on detection accuracy with the semantic network. With the deep VGG16 model, our pedestrian detection method achieves robust detection performance on the Caltech dataset.
卷积神经网络(CNN)凭借其强大的特征表示能力,在行人检测方面取得了显著进展。然而,要减少对诸如树叶、交通信号灯、电线杆等困难负样本的误报通常很困难。其中一些困难负样本可以通过利用高级语义视觉线索来去除。在本文中,我们提出了一种基于区域的CNN方法,该方法利用语义线索来更好地进行行人检测。我们的方法通过添加一个用于语义图像分割的网络分支来扩展Faster R-CNN检测框架。语义网络旨在计算互补的高级语义特征,以便与卷积特征集成。我们利用从不同网络层提取的多分辨率特征图,以确保对不同尺度的行人具有良好的检测精度。增强森林用于以级联方式训练集成特征,以进行困难负样本挖掘。在加州理工学院行人数据集上的实验表明,使用语义网络后检测精度有所提高。使用深度VGG16模型,我们的行人检测方法在加州理工学院数据集上实现了强大的检测性能。