Zhang Tianjun, Zhang Lin, Chen Yang, Zhou Yicong
IEEE Trans Image Process. 2022;31:6562-6576. doi: 10.1109/TIP.2022.3213189. Epub 2022 Oct 21.
Nowadays, visual SLAM (Simultaneous Localization And Mapping) has become a hot research topic due to its low costs and wide application scopes. Traditional visual SLAM frameworks are usually designed for single-agent systems, completing both the localization and the mapping with sensors equipped on a single robot or a mobile device. However, the mobility and work capacity of the single agent are usually limited. In reality, robots or mobile devices sometimes may be deployed in the form of clusters, such as drone formations, wearable motion capture systems, and so on. As far as we know, existing SLAM systems designed for multi-agents are still sporadic, and most of them have non-negligible limitations in functions. Specifically, on one hand, most of the existing multi-agent SLAM systems can only extract some key features and build sparse maps. On the other hand, schemes that can reconstruct the environment densely cannot get rid of the dependence on depth sensors, such as RGBD cameras or LiDARs. Systems that can yield high-density maps just with monocular camera suites are temporarily lacking. As an attempt to fill in the research gap to some extent, we design a novel collaborative SLAM system, namely CVIDS (Collaborative Visual-Inertial Dense SLAM), which follows a centralized and loosely coupled framework and can be integrated with any existing Visual-Inertial Odometry (VIO) to accomplish the co-localization and the dense reconstruction. Integrating our proposed robust loop closure detection module and two-stage pose-graph optimization pipeline, the co-localization module of CVIDS can estimate the poses of different agents in a unified coordinate system efficiently from the packed images and local poses sent by the client-ends of different agents. Besides, our motion-based dense mapping module can effectively recover the 3D structures of selected keyframes and then fuse their depth information to the global map for reconstruction. The superior performance of CVIDS is corroborated by both quantitative and qualitative experimental results. To make our results reproducible, the source code has been released at https://cslinzhang.github.io/CVIDS.
如今,视觉同步定位与建图(Visual SLAM)因其低成本和广泛的应用范围,已成为一个热门的研究课题。传统的视觉SLAM框架通常是为单智能体系统设计的,利用单个机器人或移动设备上配备的传感器来完成定位和建图任务。然而,单智能体的移动性和工作能力通常是有限的。在现实中,机器人或移动设备有时可能以集群的形式部署,如无人机编队、可穿戴运动捕捉系统等。据我们所知,现有的针对多智能体设计的SLAM系统仍然很少,并且其中大多数在功能上都有不可忽视的局限性。具体来说,一方面,现有的大多数多智能体SLAM系统只能提取一些关键特征并构建稀疏地图。另一方面,能够密集重建环境的方案无法摆脱对深度传感器的依赖,如RGB-D相机或激光雷达。目前暂时缺乏仅使用单目相机套件就能生成高密度地图的系统。作为在一定程度上填补这一研究空白的尝试,我们设计了一种新颖的协作式SLAM系统,即CVIDS(协作视觉惯性密集同步定位与建图),它遵循集中式和松耦合框架,并且可以与任何现有的视觉惯性里程计(VIO)集成,以完成协同定位和密集重建。集成我们提出的鲁棒回环检测模块和两阶段位姿图优化流程,CVIDS的协同定位模块可以根据不同智能体客户端发送的打包图像和局部位姿,在统一坐标系中高效估计不同智能体的位姿。此外,我们基于运动的密集建图模块可以有效地恢复选定关键帧的3D结构,然后将其深度信息融合到全局地图中进行重建。定量和定性实验结果都证实了CVIDS的卓越性能。为了使我们的结果具有可重复性,源代码已在https://cslinzhang.github.io/CVIDS上发布。