• 文献检索
  • 文档翻译
  • 深度研究
  • 学术资讯
  • Suppr Zotero 插件Zotero 插件
  • 邀请有礼
  • 套餐&价格
  • 历史记录
应用&插件
Suppr Zotero 插件Zotero 插件浏览器插件Mac 客户端Windows 客户端微信小程序
定价
高级版会员购买积分包购买API积分包
服务
文献检索文档翻译深度研究API 文档MCP 服务
关于我们
关于 Suppr公司介绍联系我们用户协议隐私条款
关注我们

Suppr 超能文献

核心技术专利:CN118964589B侵权必究
粤ICP备2023148730 号-1Suppr @ 2026

文献检索

告别复杂PubMed语法,用中文像聊天一样搜索,搜遍4000万医学文献。AI智能推荐,让科研检索更轻松。

立即免费搜索

文件翻译

保留排版,准确专业,支持PDF/Word/PPT等文件格式,支持 12+语言互译。

免费翻译文档

深度研究

AI帮你快速写综述,25分钟生成高质量综述,智能提取关键信息,辅助科研写作。

立即免费体验

基于金字塔变换器和多尺度特征融合的立体视觉密集单目深度估计

Dense monocular depth estimation for stereoscopic vision based on pyramid transformer and multi-scale feature fusion.

作者信息

Xia Zhongyi, Wu Tianzhao, Wang Zhuoyan, Zhou Man, Wu Boqi, Chan C Y, Kong Ling Bing

机构信息

College of New Materials and New Energies, Shenzhen Technology University, Shenzhen, 518118, Guangdong, China.

College of Applied Technology, Shenzhen University, Shenzhen, 518000, Guangdong, China.

出版信息

Sci Rep. 2024 Mar 25;14(1):7037. doi: 10.1038/s41598-024-57908-z.

DOI:10.1038/s41598-024-57908-z
PMID:38528098
原文链接:https://pmc.ncbi.nlm.nih.gov/articles/PMC10963766/
Abstract

Stereoscopic display technology plays a significant role in industries, such as film, television and autonomous driving. The accuracy of depth estimation is crucial for achieving high-quality and realistic stereoscopic display effects. In addressing the inherent challenges of applying Transformers to depth estimation, the Stereoscopic Pyramid Transformer-Depth (SPT-Depth) is introduced. This method utilizes stepwise downsampling to acquire both shallow and deep semantic information, which are subsequently fused. The training process is divided into fine and coarse convergence stages, employing distinct training strategies and hyperparameters, resulting in a substantial reduction in both training and validation losses. In the training strategy, a shift and scale-invariant mean square error function is employed to compensate for the lack of translational invariance in the Transformers. Additionally, an edge-smoothing function is applied to reduce noise in the depth map, enhancing the model's robustness. The SPT-Depth achieves a global receptive field while effectively reducing time complexity. In comparison with the baseline method, with the New York University Depth V2 (NYU Depth V2) dataset, there is a 10% reduction in Absolute Relative Error (Abs Rel) and a 36% decrease in Root Mean Square Error (RMSE). When compared with the state-of-the-art methods, there is a 17% reduction in RMSE.

摘要

立体显示技术在电影、电视和自动驾驶等行业中发挥着重要作用。深度估计的准确性对于实现高质量和逼真的立体显示效果至关重要。在解决将Transformer应用于深度估计的固有挑战时,引入了立体金字塔Transformer-深度(SPT-Depth)方法。该方法利用逐步下采样来获取浅层和深层语义信息,随后将它们融合。训练过程分为精细和粗略收敛阶段,采用不同的训练策略和超参数,从而大幅降低了训练损失和验证损失。在训练策略中,采用了平移和尺度不变均方误差函数来弥补Transformer中缺乏平移不变性的问题。此外,应用了边缘平滑函数来减少深度图中的噪声,增强模型的鲁棒性。SPT-Depth在有效降低时间复杂度的同时实现了全局感受野。与基线方法相比,在纽约大学深度数据集V2(NYU Depth V2)上,绝对相对误差(Abs Rel)降低了10%,均方根误差(RMSE)降低了36%。与最先进的方法相比,RMSE降低了17%。

https://cdn.ncbi.nlm.nih.gov/pmc/blobs/88f9/10963766/790d2229bb75/41598_2024_57908_Fig9_HTML.jpg
https://cdn.ncbi.nlm.nih.gov/pmc/blobs/88f9/10963766/9989496d9b45/41598_2024_57908_Fig1_HTML.jpg
https://cdn.ncbi.nlm.nih.gov/pmc/blobs/88f9/10963766/0bcef3dad144/41598_2024_57908_Fig2_HTML.jpg
https://cdn.ncbi.nlm.nih.gov/pmc/blobs/88f9/10963766/117024ca9244/41598_2024_57908_Fig3_HTML.jpg
https://cdn.ncbi.nlm.nih.gov/pmc/blobs/88f9/10963766/2aa557e188fa/41598_2024_57908_Fig4_HTML.jpg
https://cdn.ncbi.nlm.nih.gov/pmc/blobs/88f9/10963766/ff8740220736/41598_2024_57908_Fig5_HTML.jpg
https://cdn.ncbi.nlm.nih.gov/pmc/blobs/88f9/10963766/c06ef088005a/41598_2024_57908_Fig6_HTML.jpg
https://cdn.ncbi.nlm.nih.gov/pmc/blobs/88f9/10963766/4753043a1876/41598_2024_57908_Fig7_HTML.jpg
https://cdn.ncbi.nlm.nih.gov/pmc/blobs/88f9/10963766/35f5ed10acfb/41598_2024_57908_Fig8_HTML.jpg
https://cdn.ncbi.nlm.nih.gov/pmc/blobs/88f9/10963766/790d2229bb75/41598_2024_57908_Fig9_HTML.jpg
https://cdn.ncbi.nlm.nih.gov/pmc/blobs/88f9/10963766/9989496d9b45/41598_2024_57908_Fig1_HTML.jpg
https://cdn.ncbi.nlm.nih.gov/pmc/blobs/88f9/10963766/0bcef3dad144/41598_2024_57908_Fig2_HTML.jpg
https://cdn.ncbi.nlm.nih.gov/pmc/blobs/88f9/10963766/117024ca9244/41598_2024_57908_Fig3_HTML.jpg
https://cdn.ncbi.nlm.nih.gov/pmc/blobs/88f9/10963766/2aa557e188fa/41598_2024_57908_Fig4_HTML.jpg
https://cdn.ncbi.nlm.nih.gov/pmc/blobs/88f9/10963766/ff8740220736/41598_2024_57908_Fig5_HTML.jpg
https://cdn.ncbi.nlm.nih.gov/pmc/blobs/88f9/10963766/c06ef088005a/41598_2024_57908_Fig6_HTML.jpg
https://cdn.ncbi.nlm.nih.gov/pmc/blobs/88f9/10963766/4753043a1876/41598_2024_57908_Fig7_HTML.jpg
https://cdn.ncbi.nlm.nih.gov/pmc/blobs/88f9/10963766/35f5ed10acfb/41598_2024_57908_Fig8_HTML.jpg
https://cdn.ncbi.nlm.nih.gov/pmc/blobs/88f9/10963766/790d2229bb75/41598_2024_57908_Fig9_HTML.jpg

相似文献

1
Dense monocular depth estimation for stereoscopic vision based on pyramid transformer and multi-scale feature fusion.基于金字塔变换器和多尺度特征融合的立体视觉密集单目深度估计
Sci Rep. 2024 Mar 25;14(1):7037. doi: 10.1038/s41598-024-57908-z.
2
Monocular Depth Estimation Using a Laplacian Image Pyramid with Local Planar Guidance Layers.基于拉普拉斯图像金字塔和局部平面引导层的单目深度估计
Sensors (Basel). 2023 Jan 11;23(2):845. doi: 10.3390/s23020845.
3
AMENet is a monocular depth estimation network designed for automatic stereoscopic display.AMENet是一种为自动立体显示而设计的单目深度估计网络。
Sci Rep. 2024 Mar 11;14(1):5868. doi: 10.1038/s41598-024-56095-1.
4
A Novel Method for Monocular Depth Estimation Using an Hourglass Neck Module.一种使用沙漏颈部模块进行单目深度估计的新方法。
Sensors (Basel). 2024 Feb 18;24(4):1312. doi: 10.3390/s24041312.
5
DCPNet: A Densely Connected Pyramid Network for Monocular Depth Estimation.DCPNet:用于单目深度估计的密集连接金字塔网络。
Sensors (Basel). 2021 Oct 13;21(20):6780. doi: 10.3390/s21206780.
6
Deep Monocular Depth Estimation Based on Content and Contextual Features.基于内容和上下文特征的深度单目深度估计。
Sensors (Basel). 2023 Mar 8;23(6):2919. doi: 10.3390/s23062919.
7
GFI-Net: Global Feature Interaction Network for Monocular Depth Estimation.GFI-Net:用于单目深度估计的全局特征交互网络。
Entropy (Basel). 2023 Feb 26;25(3):421. doi: 10.3390/e25030421.
8
Synthetic Data Enhancement and Network Compression Technology of Monocular Depth Estimation for Real-Time Autonomous Driving System.用于实时自动驾驶系统的单目深度估计的合成数据增强与网络压缩技术
Sensors (Basel). 2024 Jun 28;24(13):4205. doi: 10.3390/s24134205.
9
Laplacian Pyramid Neural Network for Dense Continuous-Value Regression for Complex Scenes.用于复杂场景密集连续值回归的拉普拉斯金字塔神经网络。
IEEE Trans Neural Netw Learn Syst. 2021 Nov;32(11):5034-5046. doi: 10.1109/TNNLS.2020.3026669. Epub 2021 Oct 27.
10
RT-ViT: Real-Time Monocular Depth Estimation Using Lightweight Vision Transformers.RT-ViT:基于轻量级视觉Transformer 的实时单目深度估计。
Sensors (Basel). 2022 May 19;22(10):3849. doi: 10.3390/s22103849.

引用本文的文献

1
Deep-learning-based pyramid-transformer for localized porosity analysis of hot-press sintered ceramic paste.基于深度学习的金字塔-Transformer 用于热压烧结陶瓷糊剂的局部孔隙率分析。
PLoS One. 2024 Sep 4;19(9):e0306385. doi: 10.1371/journal.pone.0306385. eCollection 2024.

本文引用的文献

1
Generalized Pose Decoupled Network for Unsupervised 3D Skeleton Sequence-Based Action Representation Learning.用于基于无监督3D骨架序列的动作表示学习的广义姿态解耦网络。
Cyborg Bionic Syst. 2022;2022:0002. doi: 10.34133/cbsystems.0002. Epub 2022 Dec 30.
2
Attention-augmented U-Net (AA-U-Net) for semantic segmentation.用于语义分割的注意力增强型U-Net(AA-U-Net)。
Signal Image Video Process. 2023;17(4):981-989. doi: 10.1007/s11760-022-02302-3. Epub 2022 Jul 25.
3
Deep neural networks and image classification in biological vision.
深度神经网络与生物视觉中的图像分类。
Vision Res. 2022 Aug;197:108058. doi: 10.1016/j.visres.2022.108058. Epub 2022 Apr 26.
4
Auto-Rectify Network for Unsupervised Indoor Depth Estimation.自动校正网络的无监督室内深度估计。
IEEE Trans Pattern Anal Mach Intell. 2022 Dec;44(12):9802-9813. doi: 10.1109/TPAMI.2021.3136220. Epub 2022 Nov 7.
5
HUMAN-MACHINE COLLABORATION FOR MEDICAL IMAGE SEGMENTATION.用于医学图像分割的人机协作
Proc IEEE Int Conf Acoust Speech Signal Process. 2020 May;2020:1040-1044. doi: 10.1109/ICASSP40776.2020.9053555. Epub 2020 May 14.
6
Fully Convolutional Networks for Semantic Segmentation.全卷积网络用于语义分割。
IEEE Trans Pattern Anal Mach Intell. 2017 Apr;39(4):640-651. doi: 10.1109/TPAMI.2016.2572683. Epub 2016 May 24.