Chitta Kashyap, Prakash Aditya, Jaeger Bernhard, Yu Zehao, Renz Katrin, Geiger Andreas
IEEE Trans Pattern Anal Mach Intell. 2023 Nov;45(11):12878-12895. doi: 10.1109/TPAMI.2022.3200245. Epub 2023 Oct 3.
How should we integrate representations from complementary sensors for autonomous driving? Geometry-based fusion has shown promise for perception (e.g., object detection, motion forecasting). However, in the context of end-to-end driving, we find that imitation learning based on existing sensor fusion methods underperforms in complex driving scenarios with a high density of dynamic agents. Therefore, we propose TransFuser, a mechanism to integrate image and LiDAR representations using self-attention. Our approach uses transformer modules at multiple resolutions to fuse perspective view and bird's eye view feature maps. We experimentally validate its efficacy on a challenging new benchmark with long routes and dense traffic, as well as the official leaderboard of the CARLA urban driving simulator. At the time of submission, TransFuser outperforms all prior work on the CARLA leaderboard in terms of driving score by a large margin. Compared to geometry-based fusion, TransFuser reduces the average collisions per kilometer by 48%.
我们应如何整合来自互补传感器的信息以实现自动驾驶?基于几何的融合已在感知方面展现出前景(例如目标检测、运动预测)。然而,在端到端驾驶的背景下,我们发现基于现有传感器融合方法的模仿学习在具有高密度动态物体的复杂驾驶场景中表现不佳。因此,我们提出了TransFuser,一种使用自注意力机制整合图像和激光雷达信息的方法。我们的方法在多个分辨率上使用Transformer模块来融合透视视图和鸟瞰视图特征图。我们通过实验验证了它在具有长路线和密集交通的具有挑战性的新基准测试以及CARLA城市驾驶模拟器的官方排行榜上的有效性。在提交时,TransFuser在驾驶分数方面大幅超越了CARLA排行榜上的所有先前工作。与基于几何的融合相比,TransFuser将每公里的平均碰撞次数减少了48%。