LiGenCam：用于自动驾驶的基于多模态激光雷达数据重建彩色相机图像

LiGenCam: Reconstruction of Color Camera Images from Multimodal LiDAR Data for Autonomous Driving.

作者信息

Xu Minghao, Gu Yanlei, Goncharenko Igor, Kamijo Shunsuke

机构信息

Graduate School of Interdisciplinary Information Studies, The University of Tokyo, Tokyo 113-0033, Japan.

Graduate School of Advanced Science and Engineering, Hiroshima University, Hiroshima 739-8527, Japan.

出版信息

Sensors (Basel). 2025 Jul 10;25(14):4295. doi: 10.3390/s25144295.

DOI:10.3390/s25144295

PMID:40732423

原文链接:https://pmc.ncbi.nlm.nih.gov/articles/PMC12299343/

Abstract

The automotive industry is advancing toward fully automated driving, where perception systems rely on complementary sensors such as LiDAR and cameras to interpret the vehicle's surroundings. For Level 4 and higher vehicles, redundancy is vital to prevent safety-critical failures. One way to achieve this is by using data from one sensor type to support another. While much research has focused on reconstructing LiDAR point cloud data using camera images, limited work has been conducted on the reverse process-reconstructing image data from LiDAR. This paper proposes a deep learning model, named LiDAR Generative Camera (LiGenCam), to fill this gap. The model reconstructs camera images by utilizing multimodal LiDAR data, including reflectance, ambient light, and range information. LiGenCam is developed based on the Generative Adversarial Network framework, incorporating pixel-wise loss and semantic segmentation loss to guide reconstruction, ensuring both pixel-level similarity and semantic coherence. Experiments on the DurLAR dataset demonstrate that multimodal LiDAR data enhances the realism and semantic consistency of reconstructed images, and adding segmentation loss further improves semantic consistency. Ablation studies confirm these findings.

摘要

汽车行业正在朝着全自动驾驶发展，在这种情况下，感知系统依靠激光雷达和摄像头等互补传感器来解读车辆周围环境。对于4级及以上的车辆，冗余对于防止安全关键故障至关重要。实现这一点的一种方法是使用一种传感器类型的数据来支持另一种传感器。虽然许多研究都集中在使用相机图像重建激光雷达点云数据上，但从激光雷达重建图像数据的反向过程的研究却很少。本文提出了一种名为激光雷达生成相机（LiGenCam）的深度学习模型来填补这一空白。该模型利用多模态激光雷达数据（包括反射率、环境光和距离信息）重建相机图像。LiGenCam是基于生成对抗网络框架开发的，结合了逐像素损失和语义分割损失来指导重建，确保像素级相似性和语义一致性。在DurLAR数据集上的实验表明，多模态激光雷达数据增强了重建图像的真实感和语义一致性，添加分割损失进一步提高了语义一致性。消融研究证实了这些发现。