Agrawal Shiva, Bhanderi Savankumar, Elger Gordon
Institute of Innovative Mobility (IIMo), Technische Hochschule Ingolstadt, 85049 Ingolstadt, Germany.
Fraunhofer IVI, Applied Center Connected Mobility and Infrastructure, 01069 Dresden, Germany.
Sensors (Basel). 2025 May 29;25(11):3422. doi: 10.3390/s25113422.
Mono RGB cameras and automotive radar sensors provide a complementary information set that makes them excellent candidates for sensor data fusion to obtain robust traffic user detection. This has been widely used in the vehicle domain and recently introduced in roadside-mounted smart infrastructure-based road user detection. However, the performance of the most commonly used late fusion methods often degrades when the camera fails to detect road users in adverse environmental conditions. The solution is to fuse the data using deep neural networks at the early stage of the fusion pipeline to use the complete data provided by both sensors. Research has been carried out in this area, but is limited to vehicle-based sensor setups. Hence, this work proposes a novel deep neural network to jointly fuse RGB mono-camera images and 3D automotive radar point cloud data to obtain enhanced traffic user detection for the roadside-mounted smart infrastructure setup. Projected radar points are first used to generate anchors in image regions with a high likelihood of road users, including areas not visible to the camera. These anchors guide the prediction of 2D bounding boxes, object categories, and confidence scores. Valid detections are then used to segment radar points by instance, and the results are post-processed to produce final road user detections in the ground plane. The trained model is evaluated for different light and weather conditions using ground truth data from a lidar sensor. It provides a precision of 92%, recall of 78%, and F1-score of 85%. The proposed deep fusion methodology has 33%, 6%, and 21% absolute improvement in precision, recall, and F1-score, respectively, compared to object-level spatial fusion output.
单目RGB相机和汽车雷达传感器提供了一组互补的信息,这使得它们成为传感器数据融合以获得强大的交通用户检测能力的理想选择。这已在车辆领域广泛应用,最近也被引入基于路边安装的智能基础设施的道路用户检测中。然而,当相机在恶劣环境条件下无法检测到道路用户时,最常用的后期融合方法的性能往往会下降。解决方案是在融合管道的早期阶段使用深度神经网络融合数据,以利用两个传感器提供的完整数据。这一领域已经开展了相关研究,但仅限于基于车辆的传感器设置。因此,这项工作提出了一种新颖的深度神经网络,用于联合融合RGB单目相机图像和3D汽车雷达点云数据,以增强基于路边安装的智能基础设施设置的交通用户检测能力。首先使用投影雷达点在道路用户可能性高的图像区域生成锚点,包括相机不可见的区域。这些锚点指导二维边界框、物体类别和置信度得分的预测。然后使用有效的检测结果按实例分割雷达点,并对结果进行后处理,以在地面平面上生成最终的道路用户检测结果。使用来自激光雷达传感器的地面真值数据对训练好的模型在不同光照和天气条件下进行评估。它的精度为92%,召回率为78%,F1分数为85%。与目标级空间融合输出相比,所提出的深度融合方法在精度、召回率和F1分数上分别有33%、6%和21%的绝对提升。