Herrera-Granda Erick P, Torres-Cantero Juan C, Peluffo-Ordóñez Diego H
Department of Mathematics, Escuela Politécnica Nacional, Ladrón de Guevara E11-235, Quito, 170525, Ecuador.
Virtual Reality Laboratory, ETSIIT, Department of Computer Languages and Systems, University of Granada, c/Periodista Manuel Saucedo Aranda, s/n, 18071, Granada, Spain.
Heliyon. 2024 Sep 6;10(18):e37356. doi: 10.1016/j.heliyon.2024.e37356. eCollection 2024 Sep 30.
Monocular Simultaneous Localization and Mapping (SLAM), Visual Odometry (VO), and Structure from Motion (SFM) are techniques that have emerged recently to address the problem of reconstructing objects or environments using monocular cameras. Monocular pure visual techniques have become attractive solutions for 3D reconstruction tasks due to their affordability, lightweight, easy deployment, good outdoor performance, and availability in most handheld devices without requiring additional input devices. In this work, we comprehensively overview the SLAM, VO, and SFM solutions for the 3D reconstruction problem that uses a monocular RGB camera as the only source of information to gather basic knowledge of this ill-posed problem and classify the existing techniques following a taxonomy. To achieve this goal, we extended the existing taxonomy to cover all the current classifications in the literature, comprising classic, machine learning, direct, indirect, dense, and sparse methods. We performed a detailed overview of 42 methods, considering 18 classic and 24 machine learning methods according to the ten categories defined in our extended taxonomy, comprehensively systematizing their algorithms and providing their basic formulations. Relevant information about each algorithm was summarized in nine criteria for classic methods and eleven criteria for machine learning methods to provide the reader with decision components to implement, select or design a 3D reconstruction system. Finally, an analysis of the temporal evolution of each category was performed, which determined that the classical-sparse-indirect and classical-dense-indirect categories have been the most accepted solutions to the monocular 3D reconstruction problem over the last 18 years.
单目同步定位与地图构建(SLAM)、视觉里程计(VO)和运动结构恢复(SFM)是最近出现的用于解决使用单目相机重建物体或环境问题的技术。单目纯视觉技术因其成本低、重量轻、易于部署、户外性能好且在大多数手持设备中可用而无需额外输入设备,已成为3D重建任务的有吸引力的解决方案。在这项工作中,我们全面概述了用于3D重建问题的SLAM、VO和SFM解决方案,该问题使用单目RGB相机作为唯一信息源,以收集有关这个不适定问题的基础知识,并按照分类法对现有技术进行分类。为实现这一目标,我们扩展了现有的分类法,以涵盖文献中的所有当前分类,包括经典、机器学习、直接、间接、密集和稀疏方法。我们根据扩展分类法中定义的十个类别,对42种方法进行了详细概述,其中包括18种经典方法和24种机器学习方法,全面系统化了它们的算法并提供了基本公式。关于每种算法的相关信息按照经典方法的九个标准和机器学习方法的十一个标准进行了总结,为读者提供实施、选择或设计3D重建系统的决策要素。最后,对每个类别的时间演变进行了分析,确定在过去18年中,经典-稀疏-间接和经典-密集-间接类别一直是单目3D重建问题最被接受的解决方案。