Chen Hao-Xiang, Li Jiayi, Mu Tai-Jiang, Hu Shi-Min
IEEE Trans Vis Comput Graph. 2025 Sep;31(9):5188-5198. doi: 10.1109/TVCG.2024.3439583.
Neural Radiance Fields (NeRFs) have shown impressive capabilities in synthesizing photorealistic novel views. However, their application to room-size scenes is limited by the requirement of several hundred views with accurate poses for training. To address this challenge, we propose SN$^{2}$2eRF, a framework which can reconstruct the neural radiance field with significantly fewer views and noisy poses by exploiting multiple priors. Our key insight is to leverage both multi-view and monocular priors to constrain the optimization of NeRF in the setting of sparse and noisy pose inputs. Specifically, we extract and match key points to constrain pose optimization and use Ray Transformer with a monocular depth estimator to provide dense depth prior for geometry optimization. Benefiting from these priors, our approach achieves state-of-the-art accuracy in novel view synthesis for indoor room scenarios.
神经辐射场(NeRFs)在合成逼真的新视图方面展现出了令人印象深刻的能力。然而,它们在应用于房间大小的场景时,受到训练需要数百个具有精确姿态的视图这一要求的限制。为应对这一挑战,我们提出了SN²2eRF,这是一个通过利用多种先验信息,能够以显著更少的视图和有噪声的姿态来重建神经辐射场的框架。我们的关键见解是在稀疏和有噪声的姿态输入设置中,利用多视图和单目先验信息来约束NeRF的优化。具体而言,我们提取并匹配关键点以约束姿态优化,并使用带有单目深度估计器的光线变换器为几何优化提供密集的深度先验信息。受益于这些先验信息,我们的方法在室内房间场景的新视图合成中达到了当前最优的精度。