Suppr超能文献

OD-MVSNet:全维动态多视角立体网络。

OD-MVSNet: Omni-dimensional dynamic multi-view stereo network.

机构信息

College of Information Science and Electrical Engineering, Shandong Jiaotong University, Jinan, Shandong, China.

Shandong Zhengyuan Yeda Environmental Technology Co., Ltd., Jinan, Shandong, China.

出版信息

PLoS One. 2024 Aug 15;19(8):e0309029. doi: 10.1371/journal.pone.0309029. eCollection 2024.

Abstract

Multi-view stereo based on learning is a critical task in three-dimensional reconstruction, enabling the effective inference of depth maps and the reconstruction of fine-grained scene geometry. However, the results obtained by current popular 3D reconstruction methods are not precise, and achieving high-accuracy scene reconstruction remains challenging due to the pervasive impact of feature extraction and the poor correlation between cost and volume. In addressing these issues, we propose a cascade deep residual inference network to enhance the efficiency and accuracy of multi-view stereo depth estimation. This approach builds a cost-volume pyramid from coarse to fine, generating a lightweight, compact network to improve reconstruction results. Specifically, we introduce the omni-dimensional dynamic atrous spatial pyramid pooling (OSPP), a multiscale feature extraction module capable of generating dense feature maps with multiscale contextual information. The feature maps encoded by the OSPP module can generate dense point clouds without consuming significant memory. Furthermore, to alleviate the issue of feature mismatch in cost volume regularization, we propose a normalization-based 3D attention module. The 3D attention module aggregates crucial information within the cost volume across the dimensions of channel, spatial, and depth. Through extensive experiments on benchmark datasets, notably DTU, we found that the OD-MVSNet model outperforms the baseline model by approximately 1.4% in accuracy loss, 0.9% in completeness loss, and 1.2% in overall loss, demonstrating the effectiveness of our module.

摘要

基于学习的多视角立体是三维重建中的一项关键任务,能够有效地推断深度图并重建精细的场景几何结构。然而,当前流行的 3D 重建方法得到的结果并不精确,由于特征提取的普遍影响和成本与体积之间的不良相关性,实现高精度的场景重建仍然具有挑战性。在解决这些问题时,我们提出了级联深度残差推理网络,以提高多视角立体深度估计的效率和准确性。这种方法从粗到细构建成本体积金字塔,生成一个轻量级、紧凑的网络,以改善重建结果。具体来说,我们引入了全维动态空洞空间金字塔池化(OSPP),这是一种多尺度特征提取模块,能够生成具有多尺度上下文信息的密集特征图。OSPP 模块编码的特征图可以生成密集的点云,而不会消耗大量的内存。此外,为了解决成本体积正则化中特征不匹配的问题,我们提出了基于归一化的 3D 注意力模块。3D 注意力模块在通道、空间和深度维度上聚合成本体积中的关键信息。通过在基准数据集上的大量实验,特别是 DTU,我们发现 OD-MVSNet 模型在准确性损失、完整性损失和总体损失方面分别比基线模型提高了约 1.4%、0.9%和 1.2%,证明了我们模块的有效性。

https://cdn.ncbi.nlm.nih.gov/pmc/blobs/595c/11326553/233a8eb2b5f4/pone.0309029.g001.jpg

文献检索

告别复杂PubMed语法,用中文像聊天一样搜索,搜遍4000万医学文献。AI智能推荐,让科研检索更轻松。

立即免费搜索

文件翻译

保留排版,准确专业,支持PDF/Word/PPT等文件格式,支持 12+语言互译。

免费翻译文档

深度研究

AI帮你快速写综述,25分钟生成高质量综述,智能提取关键信息,辅助科研写作。

立即免费体验