Zhu Daixian, Liu Peixuan, Qiu Qiang, Wei Jiaxin, Gong Ruolin
College of Communication and Information Engineering, Xi'an University of Science and Technology, Xi'an 710054, China.
Sensors (Basel). 2024 Jul 19;24(14):4693. doi: 10.3390/s24144693.
SLAM is a critical technology for enabling autonomous navigation and positioning in unmanned vehicles. Traditional visual simultaneous localization and mapping algorithms are built upon the assumption of a static scene, overlooking the impact of dynamic targets within real-world environments. Interference from dynamic targets can significantly degrade the system's localization accuracy or even lead to tracking failure. To address these issues, we propose a dynamic visual SLAM system named BY-SLAM, which is based on BEBLID and semantic information extraction. Initially, the BEBLID descriptor is introduced to describe Oriented FAST feature points, enhancing both feature point matching accuracy and speed. Subsequently, FasterNet replaces the backbone network of YOLOv8s to expedite semantic information extraction. By using the results of DBSCAN clustering object detection, a more refined semantic mask is obtained. Finally, by leveraging the semantic mask and epipolar constraints, dynamic feature points are discerned and eliminated, allowing for the utilization of only static feature points for pose estimation and the construction of a dense 3D map that excludes dynamic targets. Experimental evaluations are conducted on both the TUM RGB-D dataset and real-world scenarios and demonstrate the effectiveness of the proposed algorithm at filtering out dynamic targets within the scenes. On average, the localization accuracy for the TUM RGB-D dataset improves by 95.53% compared to ORB-SLAM3. Comparative analyses against classical dynamic SLAM systems further corroborate the improvement in localization accuracy, map readability, and robustness achieved by BY-SLAM.
SLAM是实现无人驾驶车辆自主导航和定位的关键技术。传统的视觉同步定位与地图构建算法基于静态场景的假设,忽略了现实世界环境中动态目标的影响。动态目标的干扰会显著降低系统的定位精度,甚至导致跟踪失败。为了解决这些问题,我们提出了一种名为BY-SLAM的动态视觉SLAM系统,它基于BEBLID和语义信息提取。首先,引入BEBLID描述符来描述定向FAST特征点,提高特征点匹配的准确性和速度。随后,FasterNet取代了YOLOv8s的骨干网络以加快语义信息提取。通过使用DBSCAN聚类目标检测的结果,获得更精细的语义掩码。最后,利用语义掩码和极线约束,识别并消除动态特征点,从而仅使用静态特征点进行位姿估计,并构建排除动态目标的密集3D地图。在TUM RGB-D数据集和现实场景中进行了实验评估,结果表明所提算法在滤除场景中的动态目标方面是有效的。与ORB-SLAM3相比,TUM RGB-D数据集的定位精度平均提高了95.53%。与经典动态SLAM系统的对比分析进一步证实了BY-SLAM在定位精度、地图可读性和鲁棒性方面的提升。