基于多视角立体视觉的多步深度增强细化网络

Multi-step depth enhancement refine network with multi-view stereo.

作者信息

Ding Yuxuan, Li Kefeng, Zhang Guangyuan, Zhu Zhenfang, Wang Peng, Wang Zhenfei, Fu Chen, Li Guangchen, Pan Ke

机构信息

College of Information Science and Electrical Engineering, Shandong Jiaotong University, Jinan, Shandong, China.

Shandong Zhengyuan Yeda Environmental Technology Co., Ltd, Jinan, Shandong, China.

出版信息

PLoS One. 2025 Feb 13;20(2):e0314418. doi: 10.1371/journal.pone.0314418. eCollection 2025.

DOI:10.1371/journal.pone.0314418

PMID:39946337

原文链接:https://pmc.ncbi.nlm.nih.gov/articles/PMC11824967/

Abstract

This paper introduces an innovative multi-view stereo matching network-the Multi-Step Depth Enhancement Refine Network (MSDER-MVS), aimed at improving the accuracy and computational efficiency of high-resolution 3D reconstruction. The MSDER-MVS network leverages the potent capabilities of modern deep learning in conjunction with the geometric intuition of traditional 3D reconstruction techniques, with a particular focus on optimizing the quality of the depth map and the efficiency of the reconstruction process.Our key innovations include a dual-branch fusion structure and a Feature Pyramid Network (FPN) to effectively extract and integrate multi-scale features. With this approach, we construct depth maps progressively from coarse to fine, continuously improving depth prediction accuracy at each refinement stage. For cost volume construction, we employ a variance-based metric to integrate information from multiple perspectives, optimizing the consistency of the estimates. Moreover, we introduce a differentiable depth optimization process that iteratively enhances the quality of depth estimation using residuals and the Jacobian matrix, without the need for additional learnable parameters. This innovation significantly increases the network's convergence rate and the fineness of depth prediction.Extensive experiments on the standard DTU dataset (Aanas H, 2016) show that MSDER-MVS surpasses current advanced methods in accuracy, completeness, and overall performance metrics. Particularly in scenarios rich in detail, our method more precisely recovers surface details and textures, demonstrating its effectiveness and superiority for practical applications.Overall, the MSDER-MVS network offers a robust solution for precise and efficient 3D scene reconstruction. Looking forward, we aim to extend this approach to more complex environments and larger-scale datasets, further enhancing the model's generalization and real-time processing capabilities, and promoting the widespread deployment of multi-view stereo matching technology in practical applications.

摘要

本文介绍了一种创新的多视图立体匹配网络——多步深度增强细化网络（MSDER-MVS），旨在提高高分辨率三维重建的准确性和计算效率。MSDER-MVS网络利用现代深度学习的强大功能，并结合传统三维重建技术的几何直觉，特别关注优化深度图的质量和重建过程的效率。我们的关键创新包括双分支融合结构和特征金字塔网络（FPN），以有效地提取和整合多尺度特征。通过这种方法，我们从粗到精逐步构建深度图，在每个细化阶段不断提高深度预测的准确性。对于代价体构建，我们采用基于方差的度量来整合来自多个视角的信息，优化估计的一致性。此外，我们引入了一种可微的深度优化过程，该过程使用残差和雅可比矩阵迭代提高深度估计的质量，而无需额外的可学习参数。这一创新显著提高了网络的收敛速度和深度预测的精细度。在标准DTU数据集（Aanas H，2016）上进行的大量实验表明，MSDER-MVS在准确性、完整性和整体性能指标方面超过了当前的先进方法。特别是在细节丰富的场景中，我们的方法能够更精确地恢复表面细节和纹理，证明了其在实际应用中的有效性和优越性。总体而言，MSDER-MVS网络为精确高效的三维场景重建提供了一个强大的解决方案。展望未来，我们旨在将这种方法扩展到更复杂的环境和更大规模的数据集，进一步提高模型的泛化能力和实时处理能力，并推动多视图立体匹配技术在实际应用中的广泛部署。