• 文献检索
  • 文档翻译
  • 深度研究
  • 学术资讯
  • Suppr Zotero 插件Zotero 插件
  • 邀请有礼
  • 套餐&价格
  • 历史记录
应用&插件
Suppr Zotero 插件Zotero 插件浏览器插件Mac 客户端Windows 客户端微信小程序
定价
高级版会员购买积分包购买API积分包
服务
文献检索文档翻译深度研究API 文档MCP 服务
关于我们
关于 Suppr公司介绍联系我们用户协议隐私条款
关注我们

Suppr 超能文献

核心技术专利:CN118964589B侵权必究
粤ICP备2023148730 号-1Suppr @ 2026

文献检索

告别复杂PubMed语法,用中文像聊天一样搜索,搜遍4000万医学文献。AI智能推荐,让科研检索更轻松。

立即免费搜索

文件翻译

保留排版,准确专业,支持PDF/Word/PPT等文件格式,支持 12+语言互译。

免费翻译文档

深度研究

AI帮你快速写综述,25分钟生成高质量综述,智能提取关键信息,辅助科研写作。

立即免费体验

用于单目深度估计的残差视觉Transformer和自适应融合自动编码器

Residual Vision Transformer and Adaptive Fusion Autoencoders for Monocular Depth Estimation.

作者信息

Yang Wei-Jong, Wu Chih-Chen, Yang Jar-Ferr

机构信息

Department of Artificial Intelligence and Computer Engineering, National Chin-Yi University of Technology, Taichung 411, Taiwan.

Institute of Computer and Communication Engineering, Department of Electrical Engineering, National Cheng Kung University, Tainan 701, Taiwan.

出版信息

Sensors (Basel). 2024 Dec 26;25(1):80. doi: 10.3390/s25010080.

DOI:10.3390/s25010080
PMID:39796871
原文链接:https://pmc.ncbi.nlm.nih.gov/articles/PMC11722566/
Abstract

Precision depth estimation plays a key role in many applications, including 3D scene reconstruction, virtual reality, autonomous driving and human-computer interaction. Through recent advancements in deep learning technologies, monocular depth estimation, with its simplicity, has surpassed the traditional stereo camera systems, bringing new possibilities in 3D sensing. In this paper, by using a single camera, we propose an end-to-end supervised monocular depth estimation autoencoder, which contains an encoder with a structure with a mixed convolution neural network and vision transformers and an effective adaptive fusion decoder to obtain high-precision depth maps. In the encoder, we construct a multi-scale feature extractor by mixing residual configurations of vision transformers to enhance both local and global information. In the adaptive fusion decoder, we introduce adaptive fusion modules to effectively merge the features of the encoder and the decoder together. Lastly, the model is trained using a loss function that aligns with human perception to enable it to focus on the depth values of foreground objects. The experimental results demonstrate the effective prediction of the depth map from a single-view color image by the proposed autoencoder, which increases the first accuracy rate about 28% and reduces the root mean square error about 27% compared to an existing method in the NYU dataset.

摘要

精确深度估计在许多应用中起着关键作用,包括三维场景重建、虚拟现实、自动驾驶和人机交互。通过深度学习技术的最新进展,单目深度估计凭借其简单性超越了传统的立体相机系统,为三维传感带来了新的可能性。在本文中,我们使用单相机提出了一种端到端监督的单目深度估计自动编码器,它包含一个具有混合卷积神经网络和视觉Transformer结构的编码器以及一个有效的自适应融合解码器,以获得高精度的深度图。在编码器中,我们通过混合视觉Transformer的残差配置来构建多尺度特征提取器,以增强局部和全局信息。在自适应融合解码器中,我们引入自适应融合模块,将编码器和解码器的特征有效地融合在一起。最后,使用与人类感知一致的损失函数对模型进行训练,使其能够专注于前景物体的深度值。实验结果表明,所提出的自动编码器能够有效地从单视图彩色图像预测深度图,与纽约大学数据集中的现有方法相比,首次准确率提高了约28%,均方根误差降低了约27%。

https://cdn.ncbi.nlm.nih.gov/pmc/blobs/e3ee/11722566/1220c5fd7d12/sensors-25-00080-g020.jpg
https://cdn.ncbi.nlm.nih.gov/pmc/blobs/e3ee/11722566/5a207b8d822c/sensors-25-00080-g015.jpg
https://cdn.ncbi.nlm.nih.gov/pmc/blobs/e3ee/11722566/77dc78935b82/sensors-25-00080-g001.jpg
https://cdn.ncbi.nlm.nih.gov/pmc/blobs/e3ee/11722566/8e00f19bcb11/sensors-25-00080-g002.jpg
https://cdn.ncbi.nlm.nih.gov/pmc/blobs/e3ee/11722566/e28d35d74fda/sensors-25-00080-g003.jpg
https://cdn.ncbi.nlm.nih.gov/pmc/blobs/e3ee/11722566/7772ee31cb1c/sensors-25-00080-g004.jpg
https://cdn.ncbi.nlm.nih.gov/pmc/blobs/e3ee/11722566/6a406982502b/sensors-25-00080-g005.jpg
https://cdn.ncbi.nlm.nih.gov/pmc/blobs/e3ee/11722566/171ce18f3a25/sensors-25-00080-g006.jpg
https://cdn.ncbi.nlm.nih.gov/pmc/blobs/e3ee/11722566/6c91d2eae1da/sensors-25-00080-g007.jpg
https://cdn.ncbi.nlm.nih.gov/pmc/blobs/e3ee/11722566/3e07639f0f6e/sensors-25-00080-g008.jpg
https://cdn.ncbi.nlm.nih.gov/pmc/blobs/e3ee/11722566/96fa8a9a5b27/sensors-25-00080-g009.jpg
https://cdn.ncbi.nlm.nih.gov/pmc/blobs/e3ee/11722566/7cf9c133de96/sensors-25-00080-g010.jpg
https://cdn.ncbi.nlm.nih.gov/pmc/blobs/e3ee/11722566/67f8a7d53c8a/sensors-25-00080-g011.jpg
https://cdn.ncbi.nlm.nih.gov/pmc/blobs/e3ee/11722566/8973d5ea595e/sensors-25-00080-g012.jpg
https://cdn.ncbi.nlm.nih.gov/pmc/blobs/e3ee/11722566/88d988788bbd/sensors-25-00080-g013.jpg
https://cdn.ncbi.nlm.nih.gov/pmc/blobs/e3ee/11722566/f42859d5298c/sensors-25-00080-g014.jpg
https://cdn.ncbi.nlm.nih.gov/pmc/blobs/e3ee/11722566/367f69e4160e/sensors-25-00080-g016.jpg
https://cdn.ncbi.nlm.nih.gov/pmc/blobs/e3ee/11722566/3de89d5c9215/sensors-25-00080-g017.jpg
https://cdn.ncbi.nlm.nih.gov/pmc/blobs/e3ee/11722566/b38be6d44037/sensors-25-00080-g018.jpg
https://cdn.ncbi.nlm.nih.gov/pmc/blobs/e3ee/11722566/5d2cb4d88b2a/sensors-25-00080-g019.jpg
https://cdn.ncbi.nlm.nih.gov/pmc/blobs/e3ee/11722566/1220c5fd7d12/sensors-25-00080-g020.jpg
https://cdn.ncbi.nlm.nih.gov/pmc/blobs/e3ee/11722566/5a207b8d822c/sensors-25-00080-g015.jpg
https://cdn.ncbi.nlm.nih.gov/pmc/blobs/e3ee/11722566/77dc78935b82/sensors-25-00080-g001.jpg
https://cdn.ncbi.nlm.nih.gov/pmc/blobs/e3ee/11722566/8e00f19bcb11/sensors-25-00080-g002.jpg
https://cdn.ncbi.nlm.nih.gov/pmc/blobs/e3ee/11722566/e28d35d74fda/sensors-25-00080-g003.jpg
https://cdn.ncbi.nlm.nih.gov/pmc/blobs/e3ee/11722566/7772ee31cb1c/sensors-25-00080-g004.jpg
https://cdn.ncbi.nlm.nih.gov/pmc/blobs/e3ee/11722566/6a406982502b/sensors-25-00080-g005.jpg
https://cdn.ncbi.nlm.nih.gov/pmc/blobs/e3ee/11722566/171ce18f3a25/sensors-25-00080-g006.jpg
https://cdn.ncbi.nlm.nih.gov/pmc/blobs/e3ee/11722566/6c91d2eae1da/sensors-25-00080-g007.jpg
https://cdn.ncbi.nlm.nih.gov/pmc/blobs/e3ee/11722566/3e07639f0f6e/sensors-25-00080-g008.jpg
https://cdn.ncbi.nlm.nih.gov/pmc/blobs/e3ee/11722566/96fa8a9a5b27/sensors-25-00080-g009.jpg
https://cdn.ncbi.nlm.nih.gov/pmc/blobs/e3ee/11722566/7cf9c133de96/sensors-25-00080-g010.jpg
https://cdn.ncbi.nlm.nih.gov/pmc/blobs/e3ee/11722566/67f8a7d53c8a/sensors-25-00080-g011.jpg
https://cdn.ncbi.nlm.nih.gov/pmc/blobs/e3ee/11722566/8973d5ea595e/sensors-25-00080-g012.jpg
https://cdn.ncbi.nlm.nih.gov/pmc/blobs/e3ee/11722566/88d988788bbd/sensors-25-00080-g013.jpg
https://cdn.ncbi.nlm.nih.gov/pmc/blobs/e3ee/11722566/f42859d5298c/sensors-25-00080-g014.jpg
https://cdn.ncbi.nlm.nih.gov/pmc/blobs/e3ee/11722566/367f69e4160e/sensors-25-00080-g016.jpg
https://cdn.ncbi.nlm.nih.gov/pmc/blobs/e3ee/11722566/3de89d5c9215/sensors-25-00080-g017.jpg
https://cdn.ncbi.nlm.nih.gov/pmc/blobs/e3ee/11722566/b38be6d44037/sensors-25-00080-g018.jpg
https://cdn.ncbi.nlm.nih.gov/pmc/blobs/e3ee/11722566/5d2cb4d88b2a/sensors-25-00080-g019.jpg
https://cdn.ncbi.nlm.nih.gov/pmc/blobs/e3ee/11722566/1220c5fd7d12/sensors-25-00080-g020.jpg

相似文献

1
Residual Vision Transformer and Adaptive Fusion Autoencoders for Monocular Depth Estimation.用于单目深度估计的残差视觉Transformer和自适应融合自动编码器
Sensors (Basel). 2024 Dec 26;25(1):80. doi: 10.3390/s25010080.
2
RT-ViT: Real-Time Monocular Depth Estimation Using Lightweight Vision Transformers.RT-ViT:基于轻量级视觉Transformer 的实时单目深度估计。
Sensors (Basel). 2022 May 19;22(10):3849. doi: 10.3390/s22103849.
3
Deep Neural Networks for Accurate Depth Estimation with Latent Space Features.利用潜在空间特征实现精确深度估计的深度神经网络。
Biomimetics (Basel). 2024 Dec 9;9(12):747. doi: 10.3390/biomimetics9120747.
4
Monocular Depth Estimation Using a Laplacian Image Pyramid with Local Planar Guidance Layers.基于拉普拉斯图像金字塔和局部平面引导层的单目深度估计
Sensors (Basel). 2023 Jan 11;23(2):845. doi: 10.3390/s23020845.
5
High quality monocular depth estimation with parallel decoder.高质量单目深度估计的并行解码器。
Sci Rep. 2022 Oct 5;12(1):16616. doi: 10.1038/s41598-022-20909-x.
6
A simple monocular depth estimation network for balancing complexity and accuracy.一种用于平衡复杂度和准确性的简单单目深度估计网络。
Sci Rep. 2025 Apr 15;15(1):12860. doi: 10.1038/s41598-025-97568-1.
7
Swin-MFA: A Multi-Modal Fusion Attention Network Based on Swin-Transformer for Low-Light Image Human Segmentation.Swin-MFA:一种基于 Swin-Transformer 的多模态融合注意力网络,用于低光照图像人体分割。
Sensors (Basel). 2022 Aug 19;22(16):6229. doi: 10.3390/s22166229.
8
Lightweight monocular depth estimation using a fusion-improved transformer.使用融合改进型变压器的轻量级单目深度估计
Sci Rep. 2024 Sep 28;14(1):22472. doi: 10.1038/s41598-024-72682-8.
9
Dense monocular depth estimation for stereoscopic vision based on pyramid transformer and multi-scale feature fusion.基于金字塔变换器和多尺度特征融合的立体视觉密集单目深度估计
Sci Rep. 2024 Mar 25;14(1):7037. doi: 10.1038/s41598-024-57908-z.
10
GFI-Net: Global Feature Interaction Network for Monocular Depth Estimation.GFI-Net:用于单目深度估计的全局特征交互网络。
Entropy (Basel). 2023 Feb 26;25(3):421. doi: 10.3390/e25030421.

引用本文的文献

1
DP-AMF: Depth-Prior-Guided Adaptive Multi-Modal and Global-Local Fusion for Single-View 3D Reconstruction.DP-AMF:用于单视图3D重建的深度先验引导自适应多模态与全局-局部融合
J Imaging. 2025 Jul 21;11(7):246. doi: 10.3390/jimaging11070246.

本文引用的文献

1
Monocular Depth Estimation Using Deep Learning: A Review.基于深度学习的单目深度估计研究综述。
Sensors (Basel). 2022 Jul 18;22(14):5353. doi: 10.3390/s22145353.
2
UNet++: Redesigning Skip Connections to Exploit Multiscale Features in Image Segmentation.UNet++:重新设计跳过连接以利用图像分割中的多尺度特征。
IEEE Trans Med Imaging. 2020 Jun;39(6):1856-1867. doi: 10.1109/TMI.2019.2959609. Epub 2019 Dec 13.
3
DeepLab: Semantic Image Segmentation with Deep Convolutional Nets, Atrous Convolution, and Fully Connected CRFs.DeepLab:基于深度卷积网络、空洞卷积和全连接条件随机场的语义图像分割。
IEEE Trans Pattern Anal Mach Intell. 2018 Apr;40(4):834-848. doi: 10.1109/TPAMI.2017.2699184. Epub 2017 Apr 27.
4
Stereo processing by semiglobal matching and mutual information.通过半全局匹配和互信息进行立体处理。
IEEE Trans Pattern Anal Mach Intell. 2008 Feb;30(2):328-41. doi: 10.1109/TPAMI.2007.1166.