PTA-Det：用于 3D 目标检测的点变换关联点云和图像。

PTA-Det: Point Transformer Associating Point Cloud and Image for 3D Object Detection.

机构信息

School of Automation, Northwestern Polytechnical University, Xi'an 710129, China.

Institute of Photonics & Photon Technology, Northwest University, Xi'an 710127, China.

出版信息

Sensors (Basel). 2023 Mar 17;23(6):3229. doi: 10.3390/s23063229.

DOI:10.3390/s23063229

PMID:36991940

原文链接:https://pmc.ncbi.nlm.nih.gov/articles/PMC10052646/

Abstract

In autonomous driving, 3D object detection based on multi-modal data has become an indispensable perceptual approach when facing complex environments around the vehicle. During multi-modal detection, LiDAR and a camera are simultaneously applied for capturing and modeling. However, due to the intrinsic discrepancies between the LiDAR point and camera image, the fusion of the data for object detection encounters a series of problems, which results in most multi-modal detection methods performing worse than LiDAR-only methods. In this investigation, we propose a method named PTA-Det to improve the performance of multi-modal detection. Accompanied by PTA-Det, a Pseudo Point Cloud Generation Network is proposed, which can represent the textural and semantic features of keypoints in the image by pseudo points. Thereafter, through a transformer-based Point Fusion Transition (PFT) module, the features of LiDAR points and pseudo points from an image can be deeply fused under a unified point-based form. The combination of these modules can overcome the main obstacle of cross-modal feature fusion and achieves a complementary and discriminative representation for proposal generation. Extensive experiments on KITTI dataset support the effectiveness of PTA-Det, achieving a mAP (mean average precision) of 77.88% on the car category with relatively few LiDAR input points.

摘要

在自动驾驶中，基于多模态数据的 3D 目标检测在面对车辆周围复杂环境时已成为不可或缺的感知方法。在多模态检测中，激光雷达和相机同时被应用于捕捉和建模。然而，由于激光雷达点和相机图像之间存在固有差异，因此在进行目标检测的数据融合时会遇到一系列问题，这导致大多数多模态检测方法的性能不如激光雷达单独使用的方法。在本研究中，我们提出了一种名为 PTA-Det 的方法来提高多模态检测的性能。伴随着 PTA-Det，我们提出了一个伪点云生成网络，该网络可以通过伪点来表示图像中关键点的纹理和语义特征。然后，通过基于转换器的点融合转换（PFT）模块，可以在统一的基于点的形式下，对来自图像的激光雷达点和伪点特征进行深度融合。这些模块的组合可以克服跨模态特征融合的主要障碍，并为提案生成提供互补和有区别的表示。在 KITTI 数据集上的大量实验支持了 PTA-Det 的有效性，在相对较少的激光雷达输入点的情况下，汽车类别上的 mAP（平均精度）达到了 77.88%。