Zhai Hongjia, Huang Gan, Hu Qirui, Li Guanglin, Bao Hujun, Zhang Guofeng
IEEE Trans Vis Comput Graph. 2024 Nov;30(11):7129-7139. doi: 10.1109/TVCG.2024.3456201. Epub 2024 Oct 10.
In recent years, the paradigm of neural implicit representations has gained substantial attention in the field of Simultaneous Localization and Mapping (SLAM). However, a notable gap exists in the existing approaches when it comes to scene understanding. In this paper, we introduce NIS-SLAM, an efficient neural implicit semantic RGB-D SLAM system, that leverages a pre-trained 2D segmentation network to learn consistent semantic representations. Specifically, for high-fidelity surface reconstruction and spatial consistent scene understanding, we combine high-frequency multi-resolution tetrahedron-based features and low-frequency positional encoding as the implicit scene representations. Besides, to address the inconsistency of 2D segmentation results from multiple views, we propose a fusion strategy that integrates the semantic probabilities from previous non-keyframes into keyframes to achieve consistent semantic learning. Furthermore, we implement a confidence-based pixel sampling and progressive optimization weight function for robust camera tracking. Extensive experimental results on various datasets show the better or more competitive performance of our system when compared to other existing neural dense implicit RGB-D SLAM approaches. Finally, we also show that our approach can be used in augmented reality applications. Project page: https://zju3dv.github.io/nis_slam.
近年来,神经隐式表示范式在同步定位与地图构建(SLAM)领域受到了广泛关注。然而,现有方法在场景理解方面存在显著差距。在本文中,我们介绍了NIS-SLAM,这是一种高效的神经隐式语义RGB-D SLAM系统,它利用预训练的2D分割网络来学习一致的语义表示。具体来说,为了实现高保真表面重建和空间一致的场景理解,我们将基于高频多分辨率四面体的特征和低频位置编码相结合作为隐式场景表示。此外,为了解决多视图2D分割结果的不一致性,我们提出了一种融合策略,将来自先前非关键帧的语义概率整合到关键帧中,以实现一致的语义学习。此外,我们还实现了基于置信度的像素采样和渐进优化权重函数,以实现稳健的相机跟踪。在各种数据集上的大量实验结果表明,与其他现有的神经密集隐式RGB-D SLAM方法相比,我们的系统具有更好或更具竞争力的性能。最后,我们还表明我们的方法可用于增强现实应用。项目页面:https://zju3dv.github.io/nis_slam 。