Institute of Microelectronics, Chinese Academy of Sciences, Beijing 100029, China.
University of Chinese Academy of Sciences, Beijing 100049, China.
Sensors (Basel). 2022 Jun 18;22(12):4603. doi: 10.3390/s22124603.
Dense depth perception is critical for many applications. However, LiDAR sensors can only provide sparse depth measurements. Therefore, completing the sparse LiDAR data becomes an important task. Due to the rich textural information of RGB images, researchers commonly use synchronized RGB images to guide this depth completion. However, most existing depth completion methods simply fuse LiDAR information with RGB image information through feature concatenation or element-wise addition. In view of this, this paper proposes a method to adaptively fuse the information from these two sensors by generating different convolutional kernels according to the content and positions of the feature vectors. Specifically, we divided the features into different blocks and utilized an attention network to generate a different kernel weight for each block. These kernels were then applied to fuse the multi-modal features. Using the KITTI depth completion dataset, our method outperformed the state-of-the-art FCFR-Net method by 0.01 for the inverse mean absolute error (iMAE) metric. Furthermore, our method achieved a good balance of runtime and accuracy, which would make our method more suitable for some real-time applications.
深度感知对于许多应用来说至关重要。然而,LiDAR 传感器只能提供稀疏的深度测量值。因此,完成稀疏的 LiDAR 数据成为一项重要任务。由于 RGB 图像具有丰富的纹理信息,研究人员通常使用同步的 RGB 图像来指导这种深度完成。然而,大多数现有的深度完成方法只是通过特征连接或元素级相加将 LiDAR 信息与 RGB 图像信息简单地融合在一起。针对这一问题,本文提出了一种根据特征向量的内容和位置自适应地融合来自这两个传感器信息的方法,通过生成不同的卷积核来实现。具体来说,我们将特征分为不同的块,并使用注意力网络为每个块生成不同的核权重。然后,这些核被应用于融合多模态特征。使用 KITTI 深度完成数据集,我们的方法在逆平均绝对误差 (iMAE) 指标上比最先进的 FCFR-Net 方法提高了 0.01。此外,我们的方法在运行时间和准确性之间取得了很好的平衡,这使得我们的方法更适用于一些实时应用。