Department of Electronics Engineering and Institute of Electronics, National Chiao Tung University, Hsinchu 30010, Taiwan.
Sensors (Basel). 2020 Sep 15;20(18):5269. doi: 10.3390/s20185269.
This paper proposes a deep-learning model with task-specific bounding box regressors (TSBBRs) and conditional back-propagation mechanisms for detection of objects in motion for advanced driver assistance system (ADAS) applications. The proposed model separates the object detection networks for objects of different sizes and applies the proposed algorithm to achieve better detection results for both larger and tinier objects. For larger objects, a neural network with a larger visual receptive field is used to acquire information from larger areas. For the detection of tinier objects, the network of a smaller receptive field utilizes fine grain features. A conditional back-propagation mechanism yields different types of TSBBRs to perform data-driven learning for the set criterion and learn the representation of different object sizes without degrading each other. The design of dual-path object bounding box regressors can simultaneously detect objects in various kinds of dissimilar scales and aspect ratios. Only a single inference of neural network is needed for each frame to support the detection of multiple types of object, such as bicycles, motorbikes, cars, buses, trucks, and pedestrians, and to locate their exact positions. The proposed model was developed and implemented on different NVIDIA devices such as 1080 Ti, DRIVE-PX2 and Jetson TX-2 with the respective processing performance of 67 frames per second (fps), 19.4 fps, and 8.9 fps for the video input of 448 × 448 resolution, respectively. The proposed model can detect objects as small as 13 × 13 pixels and achieves 86.54% accuracy on a publicly available Pascal Visual Object Class (VOC) car database and 82.4% mean average precision (mAP) on a large collection of common road real scenes database (iVS database).
本文提出了一种基于特定任务边界框回归器(TSBBR)和条件反向传播机制的深度学习模型,用于检测高级驾驶辅助系统(ADAS)应用中的运动目标。所提出的模型为不同大小的目标分离了目标检测网络,并应用了所提出的算法来实现对更大和更小目标的更好检测结果。对于更大的目标,使用具有更大视觉感受野的神经网络从更大的区域获取信息。对于更小的目标的检测,使用较小感受野的网络利用精细的特征。条件反向传播机制产生不同类型的 TSBBR,以执行集准则的数据驱动学习,并学习不同对象大小的表示,而不会相互降级。双路目标边界框回归器的设计可以同时检测各种不同比例和纵横比的目标。对于 448×448 分辨率的视频输入,每个帧仅需要单个神经网络推断即可支持多种类型的对象的检测,例如自行车、摩托车、汽车、公共汽车、卡车和行人,并定位它们的确切位置。所提出的模型是在不同的 NVIDIA 设备上开发和实现的,例如 1080Ti、DRIVE-PX2 和 Jetson TX-2,它们的处理性能分别为每秒 67 帧、19.4 帧和 8.9 帧。所提出的模型可以检测到小至 13×13 像素的物体,在公共 Pascal 视觉对象类别(VOC)汽车数据库上达到 86.54%的准确率,在大型常见道路真实场景数据库(iVS 数据库)上达到 82.4%的平均精度(mAP)。