Tang Jie, Tian Fei-Peng, Feng Wei, Li Jian, Tan Ping
IEEE Trans Image Process. 2021;30:1116-1129. doi: 10.1109/TIP.2020.3040528. Epub 2020 Dec 15.
Dense depth perception is critical for autonomous driving and other robotics applications. However, modern LiDAR sensors only provide sparse depth measurement. It is thus necessary to complete the sparse LiDAR data, where a synchronized guidance RGB image is often used to facilitate this completion. Many neural networks have been designed for this task. However, they often naïvely fuse the LiDAR data and RGB image information by performing feature concatenation or element-wise addition. Inspired by the guided image filtering, we design a novel guided network to predict kernel weights from the guidance image. These predicted kernels are then applied to extract the depth image features. In this way, our network generates content-dependent and spatially-variant kernels for multi-modal feature fusion. Dynamically generated spatially-variant kernels could lead to prohibitive GPU memory consumption and computation overhead. We further design a convolution factorization to reduce computation and memory consumption. The GPU memory reduction makes it possible for feature fusion to work in multi-stage scheme. We conduct comprehensive experiments to verify our method on real-world outdoor, indoor and synthetic datasets. Our method produces strong results. It outperforms state-of-the-art methods on the NYUv2 dataset and ranks 1st on the KITTI depth completion benchmark at the time of submission. It also presents strong generalization capability under different 3D point densities, various lighting and weather conditions as well as cross-dataset evaluations. The code will be released for reproduction.
密集深度感知对于自动驾驶和其他机器人应用至关重要。然而,现代激光雷达传感器仅提供稀疏的深度测量。因此,有必要对稀疏的激光雷达数据进行补全,通常会使用同步的引导RGB图像来辅助完成这一补全过程。许多神经网络已被设计用于此任务。然而,它们通常只是简单地通过执行特征拼接或逐元素相加来融合激光雷达数据和RGB图像信息。受引导图像滤波的启发,我们设计了一种新颖的引导网络,从引导图像中预测核权重。然后将这些预测的核应用于提取深度图像特征。通过这种方式,我们的网络为多模态特征融合生成依赖于内容且空间变化的核。动态生成的空间变化核可能导致过高的GPU内存消耗和计算开销。我们进一步设计了一种卷积分解方法来减少计算和内存消耗。GPU内存的减少使得特征融合能够以多阶段方案运行。我们进行了全面的实验,在真实世界的室外、室内和合成数据集上验证了我们的方法。我们的方法取得了很好的结果。它在NYUv2数据集上优于现有方法,在提交时在KITTI深度补全基准测试中排名第一。在不同的3D点密度、各种光照和天气条件以及跨数据集评估下,它也表现出强大的泛化能力。代码将发布以供复现。