Suppr超能文献

Metric3D v2:一种用于零样本度量深度和表面法线估计的通用单目几何基础模型。

Metric3D v2: A Versatile Monocular Geometric Foundation Model for Zero-Shot Metric Depth and Surface Normal Estimation.

作者信息

Hu Mu, Yin Wei, Zhang Chi, Cai Zhipeng, Long Xiaoxiao, Chen Hao, Wang Kaixuan, Yu Gang, Shen Chunhua, Shen Shaojie

出版信息

IEEE Trans Pattern Anal Mach Intell. 2024 Dec;46(12):10579-10596. doi: 10.1109/TPAMI.2024.3444912. Epub 2024 Nov 6.

Abstract

We introduce Metric3D v2, a geometric foundation model designed for zero-shot metric depth and surface normal estimation from single images, critical for accurate 3D recovery. Depth and normal estimation, though complementary, present distinct challenges. State-of-the-art monocular depth methods achieve zero-shot generalization through affine-invariant depths, but fail to recover real-world metric scale. Conversely, current normal estimation techniques struggle with zero-shot performance due to insufficient labeled data. We propose targeted solutions for both metric depth and normal estimation. For metric depth, we present a canonical camera space transformation module that resolves metric ambiguity across various camera models and large-scale datasets, which can be easily integrated into existing monocular models. For surface normal estimation, we introduce a joint depth-normal optimization module that leverages diverse data from metric depth, allowing normal estimators to improve beyond traditional labels. Our model, trained on over 16 million images from thousands of camera models with varied annotations, excels in zero-shot generalization to new camera settings. As shown in Fig. 1, It ranks the 1st in multiple zero-shot and standard benchmarks for metric depth and surface normal prediction. Our method enables the accurate recovery of metric 3D structures on randomly collected internet images, paving the way for plausible single-image metrology. Our model also relieves the scale drift issues of monocular-SLAM (Fig. 3), leading to high-quality metric scale dense mapping. Such applications highlight the versatility of Metric3D v2 models as geometric foundation models.

摘要

我们推出了Metric3D v2,这是一种几何基础模型,旨在从单张图像中进行零样本度量深度和表面法线估计,这对于准确的3D恢复至关重要。深度估计和法线估计虽然相互补充,但面临着不同的挑战。当前最先进的单目深度方法通过仿射不变深度实现零样本泛化,但无法恢复真实世界的度量尺度。相反,由于标记数据不足,当前的法线估计技术在零样本性能方面存在困难。我们针对度量深度和法线估计提出了有针对性的解决方案。对于度量深度,我们提出了一个规范相机空间变换模块,该模块解决了各种相机模型和大规模数据集中的度量模糊性,并且可以轻松集成到现有的单目模型中。对于表面法线估计,我们引入了一个联合深度-法线优化模块,该模块利用来自度量深度的各种数据,使法线估计器能够超越传统标签进行改进。我们的模型在来自数千个具有不同注释的相机模型的超过1600万张图像上进行训练,在零样本泛化到新相机设置方面表现出色。如图1所示,它在度量深度和表面法线预测的多个零样本和标准基准测试中排名第一。我们的方法能够在随机收集的互联网图像上准确恢复度量3D结构,为合理的单图像计量学铺平了道路。我们的模型还缓解了单目SLAM的尺度漂移问题(图3),从而实现高质量的度量尺度密集映射。这些应用突出了Metric3D v2模型作为几何基础模型的多功能性。

文献AI研究员

20分钟写一篇综述,助力文献阅读效率提升50倍。

立即体验

用中文搜PubMed

大模型驱动的PubMed中文搜索引擎

马上搜索

文档翻译

学术文献翻译模型,支持多种主流文档格式。

立即体验