Ye Sheng, Hu Yubin, Lin Matthieu, Wen Yu-Hui, Zhao Wang, Liu Yong-Jin, Wang Wenping
IEEE Trans Vis Comput Graph. 2025 Sep;31(9):5275-5287. doi: 10.1109/TVCG.2024.3444036.
The reconstruction of indoor scenes from multi-view RGB images is challenging due to the coexistence of flat and texture-less regions alongside delicate and fine-grained regions. Recent methods leverage neural radiance fields aided by predicted surface normal priors to recover the scene geometry. These methods excel in producing complete and smooth results for floor and wall areas. However, they struggle to capture complex surfaces with high-frequency structures due to the inadequate neural representation and the inaccurately predicted normal priors. This work aims to reconstruct high-fidelity surfaces with fine-grained details by addressing the above limitations. To improve the capacity of the implicit representation, we propose a hybrid architecture to represent low-frequency and high-frequency regions separately. To enhance the normal priors, we introduce a simple yet effective image sharpening and denoising technique, coupled with a network that estimates the pixel-wise uncertainty of the predicted surface normal vectors. Identifying such uncertainty can prevent our model from being misled by unreliable surface normal supervisions that hinder the accurate reconstruction of intricate geometries. Experiments on the benchmark datasets show that our method outperforms existing methods in terms of reconstruction quality. Furthermore, the proposed method also generalizes well to real-world indoor scenarios captured by our hand-held mobile phones.
由于在精细和细粒度区域旁边存在平坦且无纹理的区域,从多视图RGB图像重建室内场景具有挑战性。最近的方法利用由预测的表面法线先验辅助的神经辐射场来恢复场景几何形状。这些方法在为地板和墙壁区域生成完整且平滑的结果方面表现出色。然而,由于神经表示不足和法线先验预测不准确,它们难以捕捉具有高频结构的复杂表面。这项工作旨在通过解决上述限制来重建具有细粒度细节的高保真表面。为了提高隐式表示的能力,我们提出了一种混合架构来分别表示低频和高频区域。为了增强法线先验,我们引入了一种简单而有效的图像锐化和去噪技术,并结合一个估计预测表面法线向量的逐像素不确定性的网络。识别这种不确定性可以防止我们的模型被阻碍复杂几何形状准确重建的不可靠表面法线监督误导。在基准数据集上的实验表明,我们的方法在重建质量方面优于现有方法。此外,所提出的方法在由我们的手持移动电话捕获的真实世界室内场景中也具有很好的通用性。