School of Artificial Intelligence, Beijing University of Posts and Telecommunications, Beijing 100876, China.
China Mobile Research Institute, Beijing 100053, China.
Sensors (Basel). 2021 Jun 18;21(12):4184. doi: 10.3390/s21124184.
Multispectral pedestrian detection, which consists of a color stream and thermal stream, is essential under conditions of insufficient illumination because the fusion of the two streams can provide complementary information for detecting pedestrians based on deep convolutional neural networks (CNNs). In this paper, we introduced and adapted a simple and efficient one-stage YOLOv4 to replace the current state-of-the-art two-stage fast-RCNN for multispectral pedestrian detection and to directly predict bounding boxes with confidence scores. To further improve the detection performance, we analyzed the existing multispectral fusion methods and proposed a novel multispectral channel feature fusion (MCFF) module for integrating the features from the color and thermal streams according to the illumination conditions. Moreover, several fusion architectures, such as Early Fusion, Halfway Fusion, Late Fusion, and Direct Fusion, were carefully designed based on the MCFF to transfer the feature information from the bottom to the top at different stages. Finally, the experimental results on the KAIST and Utokyo pedestrian benchmarks showed that Halfway Fusion was used to obtain the best performance of all architectures and the MCFF could adapt fused features in the two modalities. The log-average miss rate (MR) for the two modalities with reasonable settings were 4.91% and 23.14%, respectively.
多光谱行人检测由颜色流和热流组成,在光照条件不足的情况下至关重要,因为这两种流的融合可以为基于深度卷积神经网络(CNN)的行人检测提供互补信息。在本文中,我们引入并改编了一个简单而高效的单阶段 YOLOv4 来替代当前最先进的两阶段快速 R-CNN 进行多光谱行人检测,并直接用置信分数预测边界框。为了进一步提高检测性能,我们分析了现有的多光谱融合方法,并提出了一种新颖的多光谱通道特征融合(MCFF)模块,根据光照条件融合颜色流和热流的特征。此外,还根据 MCFF 精心设计了几种融合架构,如早期融合、中途融合、晚期融合和直接融合,以便在不同阶段从底部到顶部传递特征信息。最后,在 KAIST 和 Utokyo 行人基准上的实验结果表明,中途融合用于获得所有架构的最佳性能,并且 MCFF 可以适应两种模态的融合特征。两种模态的对数平均漏报率(MR)在合理设置下分别为 4.91%和 23.14%。