Chen Zhiyu, Lin Qiong, Sun Jing, Feng Yujian, Liu Shangdong, Liu Qiang, Ji Yimu, Xu He
School of Computer Science, Nanjing University of Posts and Telecommunications, No. 9 Wenyuan Road, Yadong New District, Nanjing 210023, China.
College of Automation, Nanjing University of Posts and Telecommunications, No. 9 Wenyuan Road, Yadong New District, Nanjing 210023, China.
Sensors (Basel). 2020 Dec 17;20(24):7243. doi: 10.3390/s20247243.
We focus on exploring the LIDAR-RGB fusion-based 3D object detection in this paper. This task is still challenging in two aspects: (1) the difference of data formats and sensor positions contributes to the misalignment of reasoning between the semantic features of images and the geometric features of point clouds. (2) The optimization of traditional IoU is not equal to the regression loss of bounding boxes, resulting in biased back-propagation for non-overlapping cases. In this work, we propose a cascaded cross-modality fusion network (CCFNet), which includes a cascaded multi-scale fusion module (CMF) and a novel center 3D IoU loss to resolve these two issues. Our CMF module is developed to reinforce the discriminative representation of objects by reasoning the relation of corresponding LIDAR geometric capability and RGB semantic capability of the object from two modalities. Specifically, CMF is added in a cascaded way between the RGB and LIDAR streams, which selects salient points and transmits multi-scale point cloud features to each stage of RGB streams. Moreover, our center 3D IoU loss incorporates the distance between anchor centers to avoid the oversimple optimization for non-overlapping bounding boxes. Extensive experiments on the KITTI benchmark have demonstrated that our proposed approach performs better than the compared methods.
在本文中,我们专注于探索基于激光雷达与RGB融合的3D目标检测。这项任务在两个方面仍然具有挑战性:(1)数据格式和传感器位置的差异导致图像语义特征与点云几何特征之间的推理未对齐。(2)传统交并比(IoU)的优化与边界框的回归损失不相等,导致在非重叠情况下反向传播出现偏差。在这项工作中,我们提出了一种级联跨模态融合网络(CCFNet),它包括一个级联多尺度融合模块(CMF)和一种新颖的中心3D IoU损失来解决这两个问题。我们的CMF模块旨在通过推理目标在激光雷达几何能力和RGB语义能力这两种模态下的对应关系,来增强目标的判别性表示。具体而言,CMF以级联方式添加在RGB和激光雷达流之间,它选择显著点并将多尺度点云特征传输到RGB流的每个阶段。此外,我们的中心3D IoU损失纳入了锚点中心之间的距离,以避免对非重叠边界框进行过于简单的优化。在KITTI基准数据集上进行的大量实验表明,我们提出的方法比其他对比方法表现更好。