用于前向碰撞预警的端到端单目距离估计。

End-to-End Monocular Range Estimation for Forward Collision Warning.

机构信息

College of Intelligence Science, National University of Defense Technology, Changsha 410073, China.

出版信息

Sensors (Basel). 2020 Oct 21;20(20):5941. doi: 10.3390/s20205941.

DOI:10.3390/s20205941

PMID:33096656

原文链接:https://pmc.ncbi.nlm.nih.gov/articles/PMC7589406/

Abstract

Estimating range to the closest object in front is the core component of the forward collision warning (FCW) system. Previous monocular range estimation methods mostly involve two sequential steps of object detection and range estimation. As a result, they are only effective for objects from specific categories relying on expensive object-level annotation for training, but not for unseen categories. In this paper, we present an end-to-end deep learning architecture to solve the above problems. Specifically, we represent the target range as a weighted sum of a set of potential distances. These potential distances are generated by inverse perspective projection based on intrinsic and extrinsic camera parameters, while a deep neural network predicts the corresponding weights of these distances. The whole architecture is optimized towards the range estimation task directly in an end-to-end manner with only the target range as supervision. As object category is not restricted in the training stage, the proposed method can generalize to objects with unseen categories. Furthermore, camera parameters are explicitly considered in the proposed method, making it able to generalize to images taken with different cameras and novel views. Additionally, the proposed method is not a pure black box, but provides partial interpretability by visualizing the produced weights to see which part of the image dominates the final result. We conduct experiments to verify the above properties of the proposed method on synthetic and real-world collected data.

摘要

估计与前方最近物体的距离是前向碰撞预警（FCW）系统的核心组成部分。以前的单目距离估计方法大多涉及目标检测和距离估计两个连续的步骤。因此，它们仅对特定类别的物体有效，这些物体需要昂贵的基于物体级别的标注进行训练，而对于看不见的类别则无效。在本文中，我们提出了一种端到端的深度学习架构来解决上述问题。具体来说，我们将目标距离表示为一组潜在距离的加权和。这些潜在距离是基于内在和外在相机参数通过反向透视投影生成的，而一个深度神经网络则预测这些距离的相应权重。整个架构通过仅以目标距离作为监督，以端到端的方式直接针对距离估计任务进行优化。由于在训练阶段不受对象类别的限制，因此所提出的方法可以推广到具有看不见类别的对象。此外，在提出的方法中明确考虑了相机参数，使其能够推广到使用不同相机和新视角拍摄的图像。此外，所提出的方法不是一个纯黑盒，而是通过可视化生成的权重来提供部分可解释性，以查看图像的哪一部分主导最终结果。我们在合成和真实世界采集的数据上进行实验，验证了所提出方法的上述特性。