School of Automation, Northwestern Polytechnical University, Xi'an 710129, China.
Institute of Photonics & Photon Technology, Northwest University, Xi'an 710127, China.
Sensors (Basel). 2023 Mar 17;23(6):3229. doi: 10.3390/s23063229.
In autonomous driving, 3D object detection based on multi-modal data has become an indispensable perceptual approach when facing complex environments around the vehicle. During multi-modal detection, LiDAR and a camera are simultaneously applied for capturing and modeling. However, due to the intrinsic discrepancies between the LiDAR point and camera image, the fusion of the data for object detection encounters a series of problems, which results in most multi-modal detection methods performing worse than LiDAR-only methods. In this investigation, we propose a method named PTA-Det to improve the performance of multi-modal detection. Accompanied by PTA-Det, a Pseudo Point Cloud Generation Network is proposed, which can represent the textural and semantic features of keypoints in the image by pseudo points. Thereafter, through a transformer-based Point Fusion Transition (PFT) module, the features of LiDAR points and pseudo points from an image can be deeply fused under a unified point-based form. The combination of these modules can overcome the main obstacle of cross-modal feature fusion and achieves a complementary and discriminative representation for proposal generation. Extensive experiments on KITTI dataset support the effectiveness of PTA-Det, achieving a mAP (mean average precision) of 77.88% on the car category with relatively few LiDAR input points.
在自动驾驶中,基于多模态数据的 3D 目标检测在面对车辆周围复杂环境时已成为不可或缺的感知方法。在多模态检测中,激光雷达和相机同时被应用于捕捉和建模。然而,由于激光雷达点和相机图像之间存在固有差异,因此在进行目标检测的数据融合时会遇到一系列问题,这导致大多数多模态检测方法的性能不如激光雷达单独使用的方法。在本研究中,我们提出了一种名为 PTA-Det 的方法来提高多模态检测的性能。伴随着 PTA-Det,我们提出了一个伪点云生成网络,该网络可以通过伪点来表示图像中关键点的纹理和语义特征。然后,通过基于转换器的点融合转换(PFT)模块,可以在统一的基于点的形式下,对来自图像的激光雷达点和伪点特征进行深度融合。这些模块的组合可以克服跨模态特征融合的主要障碍,并为提案生成提供互补和有区别的表示。在 KITTI 数据集上的大量实验支持了 PTA-Det 的有效性,在相对较少的激光雷达输入点的情况下,汽车类别上的 mAP(平均精度)达到了 77.88%。