GRAM, Department of Signal Theory and Communications, University of Alcalá, 28805 Alcalá de Henares, Spain.
BBVA Next Technologies, 28050 Madrid, Spain.
Sensors (Basel). 2019 Sep 20;19(19):4062. doi: 10.3390/s19194062.
In this work, we address the problem of multi-vehicle detection and tracking for traffic monitoring applications. We preset a novel intelligent visual sensor for tracking-by-detection with simultaneous pose estimation. Essentially, we adapt an Extended Kalman Filter (EKF) to work not only with the detections of the vehicles but also with their estimated coarse viewpoints, directly obtained with the vision sensor. We show that enhancing the tracking with observations of the vehicle pose, results in a better estimation of the vehicles trajectories. For the simultaneous object detection and viewpoint estimation task, we present and evaluate two independent solutions. One is based on a fast GPU implementation of a Histogram of Oriented Gradients (HOG) detector with Support Vector Machines (SVMs). For the second, we adequately modify and train the Faster R-CNN deep learning model, in order to recover from it not only the object localization but also an estimation of its pose. Finally, we publicly release a challenging dataset, the GRAM Road Traffic Monitoring (GRAM-RTM), which has been especially designed for evaluating multi-vehicle tracking approaches within the context of traffic monitoring applications. It comprises more than 700 unique vehicles annotated across more than 40.300 frames of three videos. We expect the GRAM-RTM becomes a benchmark in vehicle detection and tracking, providing the computer vision and intelligent transportation systems communities with a standard set of images, annotations and evaluation procedures for multi-vehicle tracking. We present a thorough experimental evaluation of our approaches with the GRAM-RTM, which will be useful for establishing further comparisons. The results obtained confirm that the simultaneous integration of vehicle localizations and pose estimations as observations in an EKF, improves the tracking results.
在这项工作中,我们解决了交通监控应用中的多车辆检测和跟踪问题。我们预设了一种新颖的智能视觉传感器,用于基于检测的跟踪,同时进行姿态估计。本质上,我们采用扩展卡尔曼滤波器(EKF),不仅可以处理车辆的检测结果,还可以处理直接从视觉传感器获得的车辆粗略姿态的估计值。我们表明,通过观察车辆姿态增强跟踪,可以更好地估计车辆轨迹。对于同时进行目标检测和视点估计任务,我们提出并评估了两种独立的解决方案。一种基于快速 GPU 实现的方向梯度直方图(HOG)检测器与支持向量机(SVM)的组合。对于第二种方案,我们适当地修改和训练了更快的 R-CNN 深度学习模型,以便从该模型中不仅恢复对象的定位,还恢复其姿态的估计。最后,我们公开发布了一个具有挑战性的数据集 GRAM 道路交通监控(GRAM-RTM),该数据集是专门为在交通监控应用环境中评估多车辆跟踪方法而设计的。它包含了超过 700 个独特的车辆,标注了超过 40300 帧的三个视频。我们希望 GRAM-RTM 成为车辆检测和跟踪的基准,为计算机视觉和智能交通系统社区提供用于多车辆跟踪的标准图像集、标注和评估程序。我们使用 GRAM-RTM 对我们的方法进行了全面的实验评估,这将有助于进行进一步的比较。所获得的结果证实了在 EKF 中同时集成车辆定位和姿态估计作为观测值可以改善跟踪结果。