• 文献检索
  • 文档翻译
  • 深度研究
  • 学术资讯
  • Suppr Zotero 插件Zotero 插件
  • 邀请有礼
  • 套餐&价格
  • 历史记录
应用&插件
Suppr Zotero 插件Zotero 插件浏览器插件Mac 客户端Windows 客户端微信小程序
定价
高级版会员购买积分包购买API积分包
服务
文献检索文档翻译深度研究API 文档MCP 服务
关于我们
关于 Suppr公司介绍联系我们用户协议隐私条款
关注我们

Suppr 超能文献

核心技术专利:CN118964589B侵权必究
粤ICP备2023148730 号-1Suppr @ 2026

文献检索

告别复杂PubMed语法,用中文像聊天一样搜索,搜遍4000万医学文献。AI智能推荐,让科研检索更轻松。

立即免费搜索

文件翻译

保留排版,准确专业,支持PDF/Word/PPT等文件格式,支持 12+语言互译。

免费翻译文档

深度研究

AI帮你快速写综述,25分钟生成高质量综述,智能提取关键信息,辅助科研写作。

立即免费体验

Metric3D v2:一种用于零样本度量深度和表面法线估计的通用单目几何基础模型。

Metric3D v2: A Versatile Monocular Geometric Foundation Model for Zero-Shot Metric Depth and Surface Normal Estimation.

作者信息

Hu Mu, Yin Wei, Zhang Chi, Cai Zhipeng, Long Xiaoxiao, Chen Hao, Wang Kaixuan, Yu Gang, Shen Chunhua, Shen Shaojie

出版信息

IEEE Trans Pattern Anal Mach Intell. 2024 Dec;46(12):10579-10596. doi: 10.1109/TPAMI.2024.3444912. Epub 2024 Nov 6.

DOI:10.1109/TPAMI.2024.3444912
PMID:39150798
Abstract

We introduce Metric3D v2, a geometric foundation model designed for zero-shot metric depth and surface normal estimation from single images, critical for accurate 3D recovery. Depth and normal estimation, though complementary, present distinct challenges. State-of-the-art monocular depth methods achieve zero-shot generalization through affine-invariant depths, but fail to recover real-world metric scale. Conversely, current normal estimation techniques struggle with zero-shot performance due to insufficient labeled data. We propose targeted solutions for both metric depth and normal estimation. For metric depth, we present a canonical camera space transformation module that resolves metric ambiguity across various camera models and large-scale datasets, which can be easily integrated into existing monocular models. For surface normal estimation, we introduce a joint depth-normal optimization module that leverages diverse data from metric depth, allowing normal estimators to improve beyond traditional labels. Our model, trained on over 16 million images from thousands of camera models with varied annotations, excels in zero-shot generalization to new camera settings. As shown in Fig. 1, It ranks the 1st in multiple zero-shot and standard benchmarks for metric depth and surface normal prediction. Our method enables the accurate recovery of metric 3D structures on randomly collected internet images, paving the way for plausible single-image metrology. Our model also relieves the scale drift issues of monocular-SLAM (Fig. 3), leading to high-quality metric scale dense mapping. Such applications highlight the versatility of Metric3D v2 models as geometric foundation models.

摘要

我们推出了Metric3D v2,这是一种几何基础模型,旨在从单张图像中进行零样本度量深度和表面法线估计,这对于准确的3D恢复至关重要。深度估计和法线估计虽然相互补充,但面临着不同的挑战。当前最先进的单目深度方法通过仿射不变深度实现零样本泛化,但无法恢复真实世界的度量尺度。相反,由于标记数据不足,当前的法线估计技术在零样本性能方面存在困难。我们针对度量深度和法线估计提出了有针对性的解决方案。对于度量深度,我们提出了一个规范相机空间变换模块,该模块解决了各种相机模型和大规模数据集中的度量模糊性,并且可以轻松集成到现有的单目模型中。对于表面法线估计,我们引入了一个联合深度-法线优化模块,该模块利用来自度量深度的各种数据,使法线估计器能够超越传统标签进行改进。我们的模型在来自数千个具有不同注释的相机模型的超过1600万张图像上进行训练,在零样本泛化到新相机设置方面表现出色。如图1所示,它在度量深度和表面法线预测的多个零样本和标准基准测试中排名第一。我们的方法能够在随机收集的互联网图像上准确恢复度量3D结构,为合理的单图像计量学铺平了道路。我们的模型还缓解了单目SLAM的尺度漂移问题(图3),从而实现高质量的度量尺度密集映射。这些应用突出了Metric3D v2模型作为几何基础模型的多功能性。

相似文献

1
Metric3D v2: A Versatile Monocular Geometric Foundation Model for Zero-Shot Metric Depth and Surface Normal Estimation.Metric3D v2:一种用于零样本度量深度和表面法线估计的通用单目几何基础模型。
IEEE Trans Pattern Anal Mach Intell. 2024 Dec;46(12):10579-10596. doi: 10.1109/TPAMI.2024.3444912. Epub 2024 Nov 6.
2
Virtual Normal: Enforcing Geometric Constraints for Accurate and Robust Depth Prediction.虚拟法线:为准确且稳健的深度预测实施几何约束
IEEE Trans Pattern Anal Mach Intell. 2022 Oct;44(10):7282-7295. doi: 10.1109/TPAMI.2021.3097396. Epub 2022 Sep 14.
3
Towards Robust Monocular Depth Estimation: Mixing Datasets for Zero-Shot Cross-Dataset Transfer.迈向稳健的单目深度估计:混合数据集以实现零样本跨数据集迁移。
IEEE Trans Pattern Anal Mach Intell. 2022 Mar;44(3):1623-1637. doi: 10.1109/TPAMI.2020.3019967. Epub 2022 Feb 3.
4
Towards Accurate Reconstruction of 3D Scene Shape From A Single Monocular Image.从单目图像精确重建三维场景形状
IEEE Trans Pattern Anal Mach Intell. 2023 May;45(5):6480-6494. doi: 10.1109/TPAMI.2022.3209968. Epub 2023 Apr 3.
5
SLAM-based dense surface reconstruction in monocular Minimally Invasive Surgery and its application to Augmented Reality.基于 SLAM 的单目微创手术中密集表面重建及其在增强现实中的应用。
Comput Methods Programs Biomed. 2018 May;158:135-146. doi: 10.1016/j.cmpb.2018.02.006. Epub 2018 Feb 8.
6
Monocular Depth Estimation with Augmented Ordinal Depth Relationships.基于增强序数深度关系的单目深度估计
IEEE Trans Image Process. 2018 Oct 24. doi: 10.1109/TIP.2018.2877944.
7
Adaptive Surface Normal Constraint for Geometric Estimation From Monocular Images.用于单目图像几何估计的自适应表面法线约束
IEEE Trans Pattern Anal Mach Intell. 2024 Sep;46(9):6263-6279. doi: 10.1109/TPAMI.2024.3381710. Epub 2024 Aug 6.
8
GFI-Net: Global Feature Interaction Network for Monocular Depth Estimation.GFI-Net:用于单目深度估计的全局特征交互网络。
Entropy (Basel). 2023 Feb 26;25(3):421. doi: 10.3390/e25030421.
9
Superb Monocular Depth Estimation Based on Transfer Learning and Surface Normal Guidance.基于迁移学习和表面法向导引的卓越单目深度估计。
Sensors (Basel). 2020 Aug 27;20(17):4856. doi: 10.3390/s20174856.
10
DiT-SLAM: Real-Time Dense Visual-Inertial SLAM with Implicit Depth Representation and Tightly-Coupled Graph Optimization.DiT-SLAM:基于隐式深度表示和紧密耦合图优化的实时密集视觉惯性同步定位与地图构建
Sensors (Basel). 2022 Apr 28;22(9):3389. doi: 10.3390/s22093389.

引用本文的文献

1
Application of Image Computing in Non-Destructive Detection of Chinese Cuisine.图像计算在中式菜肴无损检测中的应用
Foods. 2025 Jul 16;14(14):2488. doi: 10.3390/foods14142488.
2
Infrared Monocular Depth Estimation Based on Radiation Field Gradient Guidance and Semantic Priors in HSV Space.基于HSV空间中辐射场梯度引导和语义先验的红外单目深度估计
Sensors (Basel). 2025 Jun 27;25(13):4022. doi: 10.3390/s25134022.
3
A simple monocular depth estimation network for balancing complexity and accuracy.一种用于平衡复杂度和准确性的简单单目深度估计网络。
Sci Rep. 2025 Apr 15;15(1):12860. doi: 10.1038/s41598-025-97568-1.
4
Recognition and localization of ratoon rice rolled stubble rows based on monocular vision and model fusion.基于单目视觉和模型融合的再生稻倒伏稻茬行识别与定位
Front Plant Sci. 2025 Jan 31;16:1533206. doi: 10.3389/fpls.2025.1533206. eCollection 2025.