Huang Shi-Sheng, Chen Haoxiang, Huang Jiahui, Fu Hongbo, Hu Shi-Min
IEEE Trans Vis Comput Graph. 2023 Apr;29(4):1977-1991. doi: 10.1109/TVCG.2021.3137912. Epub 2023 Feb 28.
Maintaining global consistency continues to be critical for online 3D indoor scene reconstruction. However, it is still challenging to generate satisfactory 3D reconstruction in terms of global consistency for previous approaches using purely geometric analysis, even with bundle adjustment or loop closure techniques. In this article, we propose a novel real-time 3D reconstruction approach which effectively integrates both semantic and geometric cues. The key challenge is how to map this indicative information, i.e., semantic priors, into a metric space as measurable information, thus enabling more accurate semantic fusion leveraging both the geometric and semantic cues. To this end, we introduce a semantic space with a continuous metric function measuring the distance between discrete semantic observations. Within the semantic space, we present an accurate frame-to-model semantic tracker for camera pose estimation, and semantic pose graph equipped with semantic links between submaps for globally consistent 3D scene reconstruction. With extensive evaluation on public synthetic and real-world 3D indoor scene RGB-D datasets, we show that our approach outperforms the previous approaches for 3D scene reconstruction both quantitatively and qualitatively, especially in terms of global consistency.
保持全局一致性对于在线3D室内场景重建仍然至关重要。然而,对于以前使用纯几何分析的方法,即使采用光束法平差或回环闭合技术,在全局一致性方面生成令人满意的3D重建仍然具有挑战性。在本文中,我们提出了一种新颖的实时3D重建方法,该方法有效地整合了语义和几何线索。关键挑战在于如何将这种指示性信息,即语义先验,映射到度量空间中作为可测量信息,从而利用几何和语义线索实现更准确的语义融合。为此,我们引入了一个语义空间,该空间具有一个连续度量函数来测量离散语义观测之间的距离。在语义空间内,我们提出了一种用于相机位姿估计的精确帧到模型语义跟踪器,以及一种配备子地图之间语义链接的语义位姿图,用于全局一致的3D场景重建。通过对公共合成和真实世界3D室内场景RGB-D数据集的广泛评估,我们表明我们的方法在定量和定性方面都优于以前的3D场景重建方法,特别是在全局一致性方面。