• 文献检索
  • 文档翻译
  • 深度研究
  • 学术资讯
  • Suppr Zotero 插件Zotero 插件
  • 邀请有礼
  • 套餐&价格
  • 历史记录
应用&插件
Suppr Zotero 插件Zotero 插件浏览器插件Mac 客户端Windows 客户端微信小程序
定价
高级版会员购买积分包购买API积分包
服务
文献检索文档翻译深度研究API 文档MCP 服务
关于我们
关于 Suppr公司介绍联系我们用户协议隐私条款
关注我们

Suppr 超能文献

核心技术专利:CN118964589B侵权必究
粤ICP备2023148730 号-1Suppr @ 2026

文献检索

告别复杂PubMed语法,用中文像聊天一样搜索,搜遍4000万医学文献。AI智能推荐,让科研检索更轻松。

立即免费搜索

文件翻译

保留排版,准确专业,支持PDF/Word/PPT等文件格式,支持 12+语言互译。

免费翻译文档

深度研究

AI帮你快速写综述,25分钟生成高质量综述,智能提取关键信息,辅助科研写作。

立即免费体验

一种用于平衡复杂度和准确性的简单单目深度估计网络。

A simple monocular depth estimation network for balancing complexity and accuracy.

作者信息

Liu Xuanxuan, Tang Shuai, Feng Mengdie, Guo Xueqi, Zhang Yanru, Wang Yan

机构信息

Shenzhen Institute for Advanced Study, University of Electronic Science and Technology of China, 518000, Guangdong, Shenzhen, China.

School of Future Technology, South China University of Technology, 511442, Guangdong, Guangzhou, China.

出版信息

Sci Rep. 2025 Apr 15;15(1):12860. doi: 10.1038/s41598-025-97568-1.

DOI:10.1038/s41598-025-97568-1
PMID:40229487
原文链接:https://pmc.ncbi.nlm.nih.gov/articles/PMC11997066/
Abstract

Monocular depth estimation plays a crucial role in many downstream visual tasks. Although research on monocular depth estimation is relatively mature, it commonly involves strategies that entail increasing both the computational complexity and the number of parameters to achieve superior performance. Particularly in practical applications, enhancing the accuracy of depth prediction while ensuring computational efficiency remains a challenging issue. To tackle this challenge, we propose a novel and simple depth estimation model called SimMDE, which treats monocular depth estimation as an ordinal regression problem. Beginning with a baseline encoder, our model is equipped with a Deformable Cross-Attention Feature Fusion (DCF) decoder with sparse attention. This decoder efficiently integrates multi-scale feature maps, markedly reducing the quadratic complexity of the Transformer model. For the extraction of finer local features, we propose a Local Multi-dimensional Convolutional Attention (LMC) module. Meanwhile, we propose a Wavelet Attention Transformer (WAT) module to achieve pixel-level precise classification of images. Furthermore, we also conduct extensive experiments on two widely recognized depth estimation benchmark datasets: NYU and KITTI. The experimental findings unequivocally demonstrate that our model attains exceptional accuracy in depth estimation while upholding high computational efficiency. Remarkably, our framework SimMDE, extending from AdaBins, demonstrates enhancements, resulting in substantial improvements of 11.7% and 10.3% in the absolute relative error (AbsRel) on the NYU and KITTI datasets, respectively, with fewer parameters.

摘要

单目深度估计在许多下游视觉任务中起着关键作用。尽管单目深度估计的研究相对成熟,但它通常涉及一些策略,这些策略需要增加计算复杂度和参数数量以实现卓越的性能。特别是在实际应用中,在确保计算效率的同时提高深度预测的准确性仍然是一个具有挑战性的问题。为了应对这一挑战,我们提出了一种新颖且简单的深度估计模型SimMDE,它将单目深度估计视为一个有序回归问题。从一个基线编码器开始,我们的模型配备了一个具有稀疏注意力的可变形交叉注意力特征融合(DCF)解码器。该解码器有效地整合了多尺度特征图,显著降低了Transformer模型的二次复杂度。为了提取更精细的局部特征,我们提出了一个局部多维卷积注意力(LMC)模块。同时,我们提出了一个小波注意力Transformer(WAT)模块来实现图像的像素级精确分类。此外,我们还在两个广泛认可的深度估计基准数据集NYU和KITTI上进行了广泛的实验。实验结果明确表明,我们的模型在深度估计中达到了卓越的准确性,同时保持了高计算效率。值得注意的是,我们从AdaBins扩展而来的框架SimMDE展示了改进,在参数较少的情况下,在NYU和KITTI数据集上的绝对相对误差(AbsRel)分别大幅提高了11.7%和10.3%。

https://cdn.ncbi.nlm.nih.gov/pmc/blobs/b194/11997066/0b713e862a69/41598_2025_97568_Fig8_HTML.jpg
https://cdn.ncbi.nlm.nih.gov/pmc/blobs/b194/11997066/7b9ab06f22d8/41598_2025_97568_Fig1_HTML.jpg
https://cdn.ncbi.nlm.nih.gov/pmc/blobs/b194/11997066/eb9b1472cb7f/41598_2025_97568_Fig2_HTML.jpg
https://cdn.ncbi.nlm.nih.gov/pmc/blobs/b194/11997066/6316fdb27171/41598_2025_97568_Fig3_HTML.jpg
https://cdn.ncbi.nlm.nih.gov/pmc/blobs/b194/11997066/197aaa899250/41598_2025_97568_Fig4_HTML.jpg
https://cdn.ncbi.nlm.nih.gov/pmc/blobs/b194/11997066/bb847b428db8/41598_2025_97568_Fig5_HTML.jpg
https://cdn.ncbi.nlm.nih.gov/pmc/blobs/b194/11997066/73b3f8994e59/41598_2025_97568_Fig6_HTML.jpg
https://cdn.ncbi.nlm.nih.gov/pmc/blobs/b194/11997066/d2ecfa4b1377/41598_2025_97568_Fig7_HTML.jpg
https://cdn.ncbi.nlm.nih.gov/pmc/blobs/b194/11997066/0b713e862a69/41598_2025_97568_Fig8_HTML.jpg
https://cdn.ncbi.nlm.nih.gov/pmc/blobs/b194/11997066/7b9ab06f22d8/41598_2025_97568_Fig1_HTML.jpg
https://cdn.ncbi.nlm.nih.gov/pmc/blobs/b194/11997066/eb9b1472cb7f/41598_2025_97568_Fig2_HTML.jpg
https://cdn.ncbi.nlm.nih.gov/pmc/blobs/b194/11997066/6316fdb27171/41598_2025_97568_Fig3_HTML.jpg
https://cdn.ncbi.nlm.nih.gov/pmc/blobs/b194/11997066/197aaa899250/41598_2025_97568_Fig4_HTML.jpg
https://cdn.ncbi.nlm.nih.gov/pmc/blobs/b194/11997066/bb847b428db8/41598_2025_97568_Fig5_HTML.jpg
https://cdn.ncbi.nlm.nih.gov/pmc/blobs/b194/11997066/73b3f8994e59/41598_2025_97568_Fig6_HTML.jpg
https://cdn.ncbi.nlm.nih.gov/pmc/blobs/b194/11997066/d2ecfa4b1377/41598_2025_97568_Fig7_HTML.jpg
https://cdn.ncbi.nlm.nih.gov/pmc/blobs/b194/11997066/0b713e862a69/41598_2025_97568_Fig8_HTML.jpg

相似文献

1
A simple monocular depth estimation network for balancing complexity and accuracy.一种用于平衡复杂度和准确性的简单单目深度估计网络。
Sci Rep. 2025 Apr 15;15(1):12860. doi: 10.1038/s41598-025-97568-1.
2
BinsFormer: Revisiting Adaptive Bins for Monocular Depth Estimation.BinsFormer:重新审视用于单目深度估计的自适应 bins
IEEE Trans Image Process. 2024;33:3964-3976. doi: 10.1109/TIP.2024.3416065. Epub 2024 Jun 28.
3
Residual Vision Transformer and Adaptive Fusion Autoencoders for Monocular Depth Estimation.用于单目深度估计的残差视觉Transformer和自适应融合自动编码器
Sensors (Basel). 2024 Dec 26;25(1):80. doi: 10.3390/s25010080.
4
GFI-Net: Global Feature Interaction Network for Monocular Depth Estimation.GFI-Net:用于单目深度估计的全局特征交互网络。
Entropy (Basel). 2023 Feb 26;25(3):421. doi: 10.3390/e25030421.
5
Monocular Depth Estimation via Self-Supervised Self-Distillation.通过自监督自蒸馏进行单目深度估计
Sensors (Basel). 2024 Jun 24;24(13):4090. doi: 10.3390/s24134090.
6
A Novel Method for Monocular Depth Estimation Using an Hourglass Neck Module.一种使用沙漏颈部模块进行单目深度估计的新方法。
Sensors (Basel). 2024 Feb 18;24(4):1312. doi: 10.3390/s24041312.
7
RT-ViT: Real-Time Monocular Depth Estimation Using Lightweight Vision Transformers.RT-ViT:基于轻量级视觉Transformer 的实时单目深度估计。
Sensors (Basel). 2022 May 19;22(10):3849. doi: 10.3390/s22103849.
8
Monocular Depth Estimation Using a Laplacian Image Pyramid with Local Planar Guidance Layers.基于拉普拉斯图像金字塔和局部平面引导层的单目深度估计
Sensors (Basel). 2023 Jan 11;23(2):845. doi: 10.3390/s23020845.
9
AMENet is a monocular depth estimation network designed for automatic stereoscopic display.AMENet是一种为自动立体显示而设计的单目深度估计网络。
Sci Rep. 2024 Mar 11;14(1):5868. doi: 10.1038/s41598-024-56095-1.
10
Lightweight monocular depth estimation using a fusion-improved transformer.使用融合改进型变压器的轻量级单目深度估计
Sci Rep. 2024 Sep 28;14(1):22472. doi: 10.1038/s41598-024-72682-8.

本文引用的文献

1
Metric3D v2: A Versatile Monocular Geometric Foundation Model for Zero-Shot Metric Depth and Surface Normal Estimation.Metric3D v2:一种用于零样本度量深度和表面法线估计的通用单目几何基础模型。
IEEE Trans Pattern Anal Mach Intell. 2024 Dec;46(12):10579-10596. doi: 10.1109/TPAMI.2024.3444912. Epub 2024 Nov 6.
2
BinsFormer: Revisiting Adaptive Bins for Monocular Depth Estimation.BinsFormer:重新审视用于单目深度估计的自适应 bins
IEEE Trans Image Process. 2024;33:3964-3976. doi: 10.1109/TIP.2024.3416065. Epub 2024 Jun 28.
3
Adaptive Surface Normal Constraint for Geometric Estimation From Monocular Images.
用于单目图像几何估计的自适应表面法线约束
IEEE Trans Pattern Anal Mach Intell. 2024 Sep;46(9):6263-6279. doi: 10.1109/TPAMI.2024.3381710. Epub 2024 Aug 6.
4
Lifelong-MonoDepth: Lifelong Learning for Multidomain Monocular Metric Depth Estimation.终身单目深度估计:多领域单目度量深度估计的终身学习
IEEE Trans Neural Netw Learn Syst. 2025 Jan;36(1):796-806. doi: 10.1109/TNNLS.2023.3323487. Epub 2025 Jan 7.
5
Deep Ordinal Regression Network for Monocular Depth Estimation.用于单目深度估计的深度序数回归网络
Proc IEEE Comput Soc Conf Comput Vis Pattern Recognit. 2018 Jun;2018:2002-2011. doi: 10.1109/CVPR.2018.00214. Epub 2018 Dec 17.