探索 RGB+深度融合的实时目标检测。

Exploring RGB+Depth Fusion for Real-Time Object Detection.

机构信息

EAVISE, KU Leuven, 2860 Sint-Katelijne-Waver, Belgium.

出版信息

Sensors (Basel). 2019 Feb 19;19(4):866. doi: 10.3390/s19040866.

DOI:10.3390/s19040866

PMID:30791476

原文链接:https://pmc.ncbi.nlm.nih.gov/articles/PMC6412390/

Abstract

In this paper, we investigate whether fusing depth information on top of normal RGB data for camera-based object detection can help to increase the performance of current state-of-the-art single-shot detection networks. Indeed, depth sensing is easily acquired using depth cameras such as a Kinect or stereo setups. We investigate the optimal manner to perform this sensor fusion with a special focus on lightweight single-pass convolutional neural network (CNN) architectures, enabling real-time processing on limited hardware. For this, we implement a network architecture allowing us to parameterize at which network layer both information sources are fused together. We performed exhaustive experiments to determine the optimal fusion point in the network, from which we can conclude that fusing towards the mid to late layers provides the best results. Our best fusion models significantly outperform the baseline RGB network in both accuracy and localization of the detections.

摘要

在本文中，我们研究了在基于相机的目标检测中融合深度信息（在常规 RGB 数据之上）是否有助于提高当前最先进的单次检测网络的性能。实际上，深度感应可以使用深度相机（如 Kinect 或立体设置）轻松获取。我们研究了使用专用的轻量级单通卷积神经网络（CNN）架构执行这种传感器融合的最佳方法，从而能够在有限的硬件上进行实时处理。为此，我们实现了一种网络架构，允许我们在网络的哪个层参数化来融合这两种信息源。我们进行了详尽的实验，以确定网络中最佳的融合点，由此我们可以得出结论，融合到中晚期层可以提供最佳结果。我们的最佳融合模型在检测的准确性和定位方面都明显优于基线 RGB 网络。