Deng Yu, Zhao Baozhu, Su Junyan, Zhang Xiaohan, Liu Qi
Department of Future Technology, South China University of Technology, Guangzhou, 511400, China.
Department of Future Technology, South China University of Technology, Guangzhou, 511400, China. Electronic address: https://drliuqi.github.io/.
Neural Netw. 2026 Mar;195:108320. doi: 10.1016/j.neunet.2025.108320. Epub 2025 Nov 13.
Three-dimensional reconstruction in scenes with extreme depth variations remains challenging due to inconsistent supervisory signals between near-field and far-field regions. Existing methods fail to simultaneously address inaccurate depth estimation in distant areas and structural degradation in close-range regions. This paper proposes a novel computational framework that integrates depth-of-field supervision and multi-view consistency supervision to advance 3D Gaussian Splatting. Our approach comprises two core components: (1) Depth-of-field Supervision employs a scale-recovered monocular depth estimator (e.g., Metric3D) to generate depth priors, leverages defocus convolution to synthesize physically accurate defocused images, and enforces geometric consistency through a novel depth-of-field loss, thereby enhancing depth fidelity in both far-field and near-field regions; (2) Multi-View Consistency Supervision employing LoFTR-based semi-dense feature matching to minimize cross-view geometric errors and enforce depth consistency via least squares optimization of reliable matched points. By unifying defocus physics with multi-view geometric constraints, our method achieves superior depth fidelity, demonstrating a 0.8 dB PSNR improvement over the state-of-the-art method on the Waymo Open Dataset. This framework bridges physical imaging principles and learning-based depth regularization, offering a scalable solution for complex depth stratification in urban environments.
由于近场和远场区域之间的监督信号不一致,在具有极端深度变化的场景中进行三维重建仍然具有挑战性。现有方法无法同时解决远距离区域深度估计不准确和近距离区域结构退化的问题。本文提出了一种新颖的计算框架,该框架集成了景深监督和多视图一致性监督,以推进三维高斯点云渲染。我们的方法包括两个核心组件:(1)景深监督使用尺度恢复单目深度估计器(例如Metric3D)生成深度先验,利用散焦卷积合成物理上准确的散焦图像,并通过一种新颖的景深损失强制几何一致性,从而提高远场和近场区域的深度保真度;(2)多视图一致性监督采用基于LoFTR的半密集特征匹配,以最小化跨视图几何误差,并通过对可靠匹配点的最小二乘优化来强制深度一致性。通过将散焦物理与多视图几何约束相结合,我们的方法实现了卓越的深度保真度,在Waymo开放数据集上比现有方法的峰值信噪比提高了0.8dB。该框架弥合了物理成像原理和基于学习的深度正则化之间的差距,为城市环境中复杂的深度分层提供了一种可扩展的解决方案。