LVID-SLAM：一种基于语义信息的用于动态场景的轻量级视觉惯性同步定位与地图构建技术

LVID-SLAM: A Lightweight Visual-Inertial SLAM for Dynamic Scenes Based on Semantic Information.

作者信息

Wang Shuwen, Hu Qiming, Zhang Xu, Li Wei, Wang Ying, Zheng Enhui

机构信息

School of Mechanical and Electrical Engineering, China Jiliang University, Hangzhou 310018, China.

出版信息

Sensors (Basel). 2025 Jul 1;25(13):4117. doi: 10.3390/s25134117.

DOI:10.3390/s25134117

PMID:40648372

原文链接:https://pmc.ncbi.nlm.nih.gov/articles/PMC12251585/

Abstract

Simultaneous Localization and Mapping (SLAM) remains challenging in dynamic environments. Recent approaches combining deep learning with algorithms for dynamic scenes comprise two types: faster, less accurate object detection-based methods and highly accurate, computationally costly instance segmentation-based methods. In addition, maps lacking semantic information hinder robots from understanding their environment and performing complex tasks. This paper presents a lightweight visual-inertial SLAM system. The system is based on the classic ORB-SLAM3 framework, which starts a new thread for object detection and tightly couples the semantic information of object detection with geometric information to remove feature points from dynamic objects. In addition, Inertial Measurement Unit (IMU) data are employed to assist in feature point extraction, thereby compensating for visual pose tracking loss. Finally, a dense octree-based semantic map is constructed by fusing semantic information and visualized using ROS. LVID-SLAM demonstrates excellent pose accuracy and robustness in highly dynamic scenes on the public TUM dataset, with an average ATE reduction of more than 80% compared to ORB-SLAM3. The experimental results demonstrate that LVID-SLAM outperforms other methods in dynamic conditions, offering both real-time capability and robustness.

摘要

同时定位与地图构建（SLAM）在动态环境中仍然具有挑战性。最近将深度学习与动态场景算法相结合的方法包括两类：基于目标检测的方法速度更快但精度较低，以及基于实例分割的方法精度高但计算成本高昂。此外，缺乏语义信息的地图阻碍机器人理解其环境并执行复杂任务。本文提出了一种轻量级视觉惯性SLAM系统。该系统基于经典的ORB-SLAM3框架，它为目标检测启动一个新线程，并将目标检测的语义信息与几何信息紧密结合，以去除动态物体上的特征点。此外，利用惯性测量单元（IMU）数据辅助特征点提取，从而弥补视觉姿态跟踪的损失。最后，通过融合语义信息构建基于八叉树的密集语义地图，并使用ROS进行可视化。在公开的TUM数据集上，LVID-SLAM在高动态场景中展现出出色的姿态精度和鲁棒性，与ORB-SLAM3相比，平均绝对轨迹误差（ATE）降低了80%以上。实验结果表明，LVID-SLAM在动态条件下优于其他方法，兼具实时性和鲁棒性。