• 文献检索
  • 文档翻译
  • 深度研究
  • 学术资讯
  • Suppr Zotero 插件Zotero 插件
  • 邀请有礼
  • 套餐&价格
  • 历史记录
应用&插件
Suppr Zotero 插件Zotero 插件浏览器插件Mac 客户端Windows 客户端微信小程序
定价
高级版会员购买积分包购买API积分包
服务
文献检索文档翻译深度研究API 文档MCP 服务
关于我们
关于 Suppr公司介绍联系我们用户协议隐私条款
关注我们

Suppr 超能文献

核心技术专利:CN118964589B侵权必究
粤ICP备2023148730 号-1Suppr @ 2026

文献检索

告别复杂PubMed语法,用中文像聊天一样搜索,搜遍4000万医学文献。AI智能推荐,让科研检索更轻松。

立即免费搜索

文件翻译

保留排版,准确专业,支持PDF/Word/PPT等文件格式,支持 12+语言互译。

免费翻译文档

深度研究

AI帮你快速写综述,25分钟生成高质量综述,智能提取关键信息,辅助科研写作。

立即免费体验

迈向稳健的单目深度估计:混合数据集以实现零样本跨数据集迁移。

Towards Robust Monocular Depth Estimation: Mixing Datasets for Zero-Shot Cross-Dataset Transfer.

出版信息

IEEE Trans Pattern Anal Mach Intell. 2022 Mar;44(3):1623-1637. doi: 10.1109/TPAMI.2020.3019967. Epub 2022 Feb 3.

DOI:10.1109/TPAMI.2020.3019967
PMID:32853149
Abstract

The success of monocular depth estimation relies on large and diverse training sets. Due to the challenges associated with acquiring dense ground-truth depth across different environments at scale, a number of datasets with distinct characteristics and biases have emerged. We develop tools that enable mixing multiple datasets during training, even if their annotations are incompatible. In particular, we propose a robust training objective that is invariant to changes in depth range and scale, advocate the use of principled multi-objective learning to combine data from different sources, and highlight the importance of pretraining encoders on auxiliary tasks. Armed with these tools, we experiment with five diverse training datasets, including a new, massive data source: 3D films. To demonstrate the generalization power of our approach we use zero-shot cross-dataset transfer, i.e. we evaluate on datasets that were not seen during training. The experiments confirm that mixing data from complementary sources greatly improves monocular depth estimation. Our approach clearly outperforms competing methods across diverse datasets, setting a new state of the art for monocular depth estimation.

摘要

单目深度估计的成功依赖于大型且多样化的训练集。由于在不同环境中获取密集的真实深度数据具有挑战性,因此出现了许多具有不同特点和偏差的数据集。我们开发了一些工具,这些工具可以在训练期间混合多个数据集,即使它们的注释不兼容。具体来说,我们提出了一种稳健的训练目标,该目标对深度范围和比例的变化具有不变性,提倡使用有原则的多目标学习来组合来自不同来源的数据,并强调在辅助任务上对编码器进行预训练的重要性。有了这些工具,我们使用五个不同的训练数据集进行了实验,包括一个新的、大规模的数据源:3D 电影。为了展示我们方法的泛化能力,我们使用零样本跨数据集迁移,即在训练过程中未见过的数据集上进行评估。实验证实,混合来自互补源的数据可以极大地提高单目深度估计的性能。我们的方法在各种数据集上明显优于竞争方法,为单目深度估计树立了新的技术水平。

相似文献

1
Towards Robust Monocular Depth Estimation: Mixing Datasets for Zero-Shot Cross-Dataset Transfer.迈向稳健的单目深度估计:混合数据集以实现零样本跨数据集迁移。
IEEE Trans Pattern Anal Mach Intell. 2022 Mar;44(3):1623-1637. doi: 10.1109/TPAMI.2020.3019967. Epub 2022 Feb 3.
2
Metric3D v2: A Versatile Monocular Geometric Foundation Model for Zero-Shot Metric Depth and Surface Normal Estimation.Metric3D v2:一种用于零样本度量深度和表面法线估计的通用单目几何基础模型。
IEEE Trans Pattern Anal Mach Intell. 2024 Dec;46(12):10579-10596. doi: 10.1109/TPAMI.2024.3444912. Epub 2024 Nov 6.
3
An efficient encoder-decoder model for portrait depth estimation from single images trained on pixel-accurate synthetic data.基于像素级精确合成数据训练的用于单幅图像人像深度估计的高效编解码模型。
Neural Netw. 2021 Oct;142:479-491. doi: 10.1016/j.neunet.2021.07.007. Epub 2021 Jul 13.
4
SFA-MDEN: Semantic-Feature-Aided Monocular Depth Estimation Network Using Dual Branches.SFA-MDEN:基于语义特征辅助的双通道单目深度估计网络。
Sensors (Basel). 2021 Aug 13;21(16):5476. doi: 10.3390/s21165476.
5
EndoSLAM dataset and an unsupervised monocular visual odometry and depth estimation approach for endoscopic videos.内镜 SLAM 数据集和一种用于内镜视频的无监督单目视觉里程计和深度估计方法。
Med Image Anal. 2021 Jul;71:102058. doi: 10.1016/j.media.2021.102058. Epub 2021 Apr 15.
6
Virtual Normal: Enforcing Geometric Constraints for Accurate and Robust Depth Prediction.虚拟法线:为准确且稳健的深度预测实施几何约束
IEEE Trans Pattern Anal Mach Intell. 2022 Oct;44(10):7282-7295. doi: 10.1109/TPAMI.2021.3097396. Epub 2022 Sep 14.
7
Towards Accurate Reconstruction of 3D Scene Shape From A Single Monocular Image.从单目图像精确重建三维场景形状
IEEE Trans Pattern Anal Mach Intell. 2023 May;45(5):6480-6494. doi: 10.1109/TPAMI.2022.3209968. Epub 2023 Apr 3.
8
A Study on the Generality of Neural Network Structures for Monocular Depth Estimation.单目深度估计中神经网络结构通用性的研究
IEEE Trans Pattern Anal Mach Intell. 2024 Apr;46(4):2224-2238. doi: 10.1109/TPAMI.2023.3332407. Epub 2024 Mar 6.
9
Monocular Depth Estimation with Augmented Ordinal Depth Relationships.基于增强序数深度关系的单目深度估计
IEEE Trans Image Process. 2018 Oct 24. doi: 10.1109/TIP.2018.2877944.
10
What makes the unsupervised monocular depth estimation (UMDE) model training better.是什么让无监督单目深度估计 (UMDE) 模型训练效果更好。
Sci Rep. 2022 Dec 20;12(1):21999. doi: 10.1038/s41598-022-26613-0.

引用本文的文献

1
Who expands the human creative frontier with generative AI: Hive minds or masterminds?是谁用生成式人工智能拓展了人类的创造性前沿:群体智慧还是杰出头脑?
Sci Adv. 2025 Sep 5;11(36):eadu5800. doi: 10.1126/sciadv.adu5800. Epub 2025 Sep 3.
2
Human-like monocular depth biases in deep neural networks.深度神经网络中类似人类的单眼深度偏差。
PLoS Comput Biol. 2025 Aug 19;21(8):e1013020. doi: 10.1371/journal.pcbi.1013020. eCollection 2025 Aug.
3
AI-Based Vehicle State Estimation Using Multi-Sensor Perception and Real-World Data.基于人工智能的车辆状态估计:利用多传感器感知和真实世界数据
Sensors (Basel). 2025 Jul 8;25(14):4253. doi: 10.3390/s25144253.
4
DP-AMF: Depth-Prior-Guided Adaptive Multi-Modal and Global-Local Fusion for Single-View 3D Reconstruction.DP-AMF:用于单视图3D重建的深度先验引导自适应多模态与全局-局部融合
J Imaging. 2025 Jul 21;11(7):246. doi: 10.3390/jimaging11070246.
5
Depth from 2D Images: Development and Metrological Evaluation of System Uncertainty Applied to Agricultural Scenarios.基于二维图像的深度测量:应用于农业场景的系统不确定性的发展与计量评估
Sensors (Basel). 2025 Jun 17;25(12):3790. doi: 10.3390/s25123790.
6
On the use of deep learning for computer-generated holography.关于深度学习在计算机生成全息术中的应用。
iScience. 2025 Apr 23;28(5):112507. doi: 10.1016/j.isci.2025.112507. eCollection 2025 May 16.
7
DASNeRF: depth consistency optimization, adaptive sampling, and hierarchical structural fusion for sparse view neural radiance fields.DASNeRF:用于稀疏视图神经辐射场的深度一致性优化、自适应采样和分层结构融合
PLoS One. 2025 May 12;20(5):e0321878. doi: 10.1371/journal.pone.0321878. eCollection 2025.
8
in situ Transformation of Information Into DNA Storage With Microfluidic Very Large-Scale Integration Platform.利用微流控超大规模集成平台将信息原位转化为DNA存储
Small. 2025 May 2:e2412225. doi: 10.1002/smll.202412225.
9
Haptics-based, higher-order sensory substitution designed for object negotiation in blindness and low vision: Virtual Whiskers.基于触觉的高阶感官替代,专为盲人和低视力者的物体识别设计:虚拟触须
Disabil Rehabil Assist Technol. 2025 Feb 21:1-20. doi: 10.1080/17483107.2025.2458112.
10
Artificial intelligence-powered 3D analysis of video-based caregiver-child interactions.基于视频的照护者与儿童互动的人工智能驱动3D分析
Sci Adv. 2025 Feb 14;11(7):eadp4422. doi: 10.1126/sciadv.adp4422. Epub 2025 Feb 19.