• 文献检索
  • 文档翻译
  • 深度研究
  • 学术资讯
  • Suppr Zotero 插件Zotero 插件
  • 邀请有礼
  • 套餐&价格
  • 历史记录
应用&插件
Suppr Zotero 插件Zotero 插件浏览器插件Mac 客户端Windows 客户端微信小程序
定价
高级版会员购买积分包购买API积分包
服务
文献检索文档翻译深度研究API 文档MCP 服务
关于我们
关于 Suppr公司介绍联系我们用户协议隐私条款
关注我们

Suppr 超能文献

核心技术专利:CN118964589B侵权必究
粤ICP备2023148730 号-1Suppr @ 2026

文献检索

告别复杂PubMed语法,用中文像聊天一样搜索,搜遍4000万医学文献。AI智能推荐,让科研检索更轻松。

立即免费搜索

文件翻译

保留排版,准确专业,支持PDF/Word/PPT等文件格式,支持 12+语言互译。

免费翻译文档

深度研究

AI帮你快速写综述,25分钟生成高质量综述,智能提取关键信息,辅助科研写作。

立即免费体验

GeometryFormer:集成几何感知的半卷积Transformer,用于自动驾驶场景中的深度补全

GeometryFormer: Semi-Convolutional Transformer Integrated with Geometric Perception for Depth Completion in Autonomous Driving Scenes.

作者信息

Su Siyuan, Wu Jian

机构信息

National Key Laboratory of Automotive Chassis Integration and Bionics, Jilin University, Changchun 130025, China.

出版信息

Sensors (Basel). 2024 Dec 18;24(24):8066. doi: 10.3390/s24248066.

DOI:10.3390/s24248066
PMID:39771801
原文链接:https://pmc.ncbi.nlm.nih.gov/articles/PMC11679245/
Abstract

Depth completion is widely employed in Simultaneous Localization and Mapping (SLAM) and Structure from Motion (SfM), which are of great significance to the development of autonomous driving. Recently, the methods based on the fusion of vision transformer (ViT) and convolution have brought the accuracy to a new level. However, there are still two shortcomings that need to be solved. On the one hand, for the poor performance of ViT in details, this paper proposes a semi-convolutional vision transformer to optimize local continuity and designs a geometric perception module to learn the positional correlation and geometric features of sparse points in three-dimensional space to perceive the geometric structures in depth maps for optimizing the recovery of edges and transparent areas. On the other hand, previous methods implement single-stage fusion to directly concatenate or add the outputs of ViT and convolution, resulting in incomplete fusion of the two, especially in complex outdoor scenes, which will generate lots of outliers and ripples. This paper proposes a novel double-stage fusion strategy, applying learnable confidence after self-attention to flexibly learn the weight of local features. Our network achieves state-of-the-art (SoTA) performance with the NYU-Depth-v2 Dataset and the KITTI Depth Completion Dataset. It is worth mentioning that the root mean square error (RMSE) of our model on the NYU-Depth-v2 Dataset is 87.9 mm, which is currently the best among all algorithms. At the end of the article, we also verified the generalization ability in real road scenes.

摘要

深度补全在同步定位与地图构建(SLAM)和运动结构恢复(SfM)中得到了广泛应用,这对自动驾驶的发展具有重要意义。最近,基于视觉Transformer(ViT)与卷积融合的方法将精度提升到了一个新水平。然而,仍有两个缺点需要解决。一方面,针对ViT在细节方面表现不佳的问题,本文提出了一种半卷积视觉Transformer来优化局部连续性,并设计了一个几何感知模块,以学习三维空间中稀疏点的位置相关性和几何特征,从而感知深度图中的几何结构,优化边缘和透明区域的恢复。另一方面,先前的方法采用单阶段融合,直接将ViT和卷积的输出连接或相加,导致两者融合不充分,尤其是在复杂的户外场景中,会产生大量异常值和波动。本文提出了一种新颖的双阶段融合策略,在自注意力之后应用可学习的置信度来灵活学习局部特征的权重。我们的网络在NYU-Depth-v2数据集和KITTI深度补全数据集上取得了领先(SoTA)性能。值得一提的是,我们的模型在NYU-Depth-v2数据集上的均方根误差(RMSE)为87.9毫米,这是目前所有算法中最好的。在文章结尾,我们还在真实道路场景中验证了其泛化能力。

https://cdn.ncbi.nlm.nih.gov/pmc/blobs/e792/11679245/b3a60d056d9c/sensors-24-08066-g013.jpg
https://cdn.ncbi.nlm.nih.gov/pmc/blobs/e792/11679245/55f922107024/sensors-24-08066-g001.jpg
https://cdn.ncbi.nlm.nih.gov/pmc/blobs/e792/11679245/99376e4d2274/sensors-24-08066-g002.jpg
https://cdn.ncbi.nlm.nih.gov/pmc/blobs/e792/11679245/6322139d3c46/sensors-24-08066-g003.jpg
https://cdn.ncbi.nlm.nih.gov/pmc/blobs/e792/11679245/42c42975646b/sensors-24-08066-g004.jpg
https://cdn.ncbi.nlm.nih.gov/pmc/blobs/e792/11679245/fa5e4a798218/sensors-24-08066-g005.jpg
https://cdn.ncbi.nlm.nih.gov/pmc/blobs/e792/11679245/e4f7b4a7a2fd/sensors-24-08066-g006.jpg
https://cdn.ncbi.nlm.nih.gov/pmc/blobs/e792/11679245/845aaf9b20f1/sensors-24-08066-g007.jpg
https://cdn.ncbi.nlm.nih.gov/pmc/blobs/e792/11679245/60f14e8631c5/sensors-24-08066-g008.jpg
https://cdn.ncbi.nlm.nih.gov/pmc/blobs/e792/11679245/f33d6f5974f6/sensors-24-08066-g009.jpg
https://cdn.ncbi.nlm.nih.gov/pmc/blobs/e792/11679245/aec75918a839/sensors-24-08066-g010.jpg
https://cdn.ncbi.nlm.nih.gov/pmc/blobs/e792/11679245/d393639d38e7/sensors-24-08066-g011.jpg
https://cdn.ncbi.nlm.nih.gov/pmc/blobs/e792/11679245/e501917f775e/sensors-24-08066-g012.jpg
https://cdn.ncbi.nlm.nih.gov/pmc/blobs/e792/11679245/b3a60d056d9c/sensors-24-08066-g013.jpg
https://cdn.ncbi.nlm.nih.gov/pmc/blobs/e792/11679245/55f922107024/sensors-24-08066-g001.jpg
https://cdn.ncbi.nlm.nih.gov/pmc/blobs/e792/11679245/99376e4d2274/sensors-24-08066-g002.jpg
https://cdn.ncbi.nlm.nih.gov/pmc/blobs/e792/11679245/6322139d3c46/sensors-24-08066-g003.jpg
https://cdn.ncbi.nlm.nih.gov/pmc/blobs/e792/11679245/42c42975646b/sensors-24-08066-g004.jpg
https://cdn.ncbi.nlm.nih.gov/pmc/blobs/e792/11679245/fa5e4a798218/sensors-24-08066-g005.jpg
https://cdn.ncbi.nlm.nih.gov/pmc/blobs/e792/11679245/e4f7b4a7a2fd/sensors-24-08066-g006.jpg
https://cdn.ncbi.nlm.nih.gov/pmc/blobs/e792/11679245/845aaf9b20f1/sensors-24-08066-g007.jpg
https://cdn.ncbi.nlm.nih.gov/pmc/blobs/e792/11679245/60f14e8631c5/sensors-24-08066-g008.jpg
https://cdn.ncbi.nlm.nih.gov/pmc/blobs/e792/11679245/f33d6f5974f6/sensors-24-08066-g009.jpg
https://cdn.ncbi.nlm.nih.gov/pmc/blobs/e792/11679245/aec75918a839/sensors-24-08066-g010.jpg
https://cdn.ncbi.nlm.nih.gov/pmc/blobs/e792/11679245/d393639d38e7/sensors-24-08066-g011.jpg
https://cdn.ncbi.nlm.nih.gov/pmc/blobs/e792/11679245/e501917f775e/sensors-24-08066-g012.jpg
https://cdn.ncbi.nlm.nih.gov/pmc/blobs/e792/11679245/b3a60d056d9c/sensors-24-08066-g013.jpg

相似文献

1
GeometryFormer: Semi-Convolutional Transformer Integrated with Geometric Perception for Depth Completion in Autonomous Driving Scenes.GeometryFormer:集成几何感知的半卷积Transformer,用于自动驾驶场景中的深度补全
Sensors (Basel). 2024 Dec 18;24(24):8066. doi: 10.3390/s24248066.
2
RT-ViT: Real-Time Monocular Depth Estimation Using Lightweight Vision Transformers.RT-ViT:基于轻量级视觉Transformer 的实时单目深度估计。
Sensors (Basel). 2022 May 19;22(10):3849. doi: 10.3390/s22103849.
3
A Transformer-Based Image-Guided Depth-Completion Model with Dual-Attention Fusion Module.一种基于Transformer的具有双注意力融合模块的图像引导深度补全模型。
Sensors (Basel). 2024 Sep 27;24(19):6270. doi: 10.3390/s24196270.
4
Learning Depth with Convolutional Spatial Propagation Network.基于卷积空间传播网络的深度学习
IEEE Trans Pattern Anal Mach Intell. 2020 Oct;42(10):2361-2379. doi: 10.1109/TPAMI.2019.2947374. Epub 2019 Oct 15.
5
Residual Vision Transformer and Adaptive Fusion Autoencoders for Monocular Depth Estimation.用于单目深度估计的残差视觉Transformer和自适应融合自动编码器
Sensors (Basel). 2024 Dec 26;25(1):80. doi: 10.3390/s25010080.
6
Monocular Depth Estimation Using a Laplacian Image Pyramid with Local Planar Guidance Layers.基于拉普拉斯图像金字塔和局部平面引导层的单目深度估计
Sensors (Basel). 2023 Jan 11;23(2):845. doi: 10.3390/s23020845.
7
Lightweight monocular depth estimation using a fusion-improved transformer.使用融合改进型变压器的轻量级单目深度估计
Sci Rep. 2024 Sep 28;14(1):22472. doi: 10.1038/s41598-024-72682-8.
8
Dense monocular depth estimation for stereoscopic vision based on pyramid transformer and multi-scale feature fusion.基于金字塔变换器和多尺度特征融合的立体视觉密集单目深度估计
Sci Rep. 2024 Mar 25;14(1):7037. doi: 10.1038/s41598-024-57908-z.
9
Swin Unet3D: a three-dimensional medical image segmentation network combining vision transformer and convolution.Swin Unet3D:一种结合视觉Transformer 和卷积的三维医学图像分割网络。
BMC Med Inform Decis Mak. 2023 Feb 14;23(1):33. doi: 10.1186/s12911-023-02129-z.
10
Confidence Propagation through CNNs for Guided Sparse Depth Regression.通过卷积神经网络进行置信传播以实现引导式稀疏深度回归
IEEE Trans Pattern Anal Mach Intell. 2020 Oct;42(10):2423-2436. doi: 10.1109/TPAMI.2019.2929170. Epub 2019 Jul 17.

引用本文的文献

1
GAC-Net: A Geometric-Attention Fusion Network for Sparse Depth Completion from LiDAR and Image.GAC网络:一种用于从激光雷达和图像进行稀疏深度补全的几何注意力融合网络。
Sensors (Basel). 2025 Sep 4;25(17):5495. doi: 10.3390/s25175495.

本文引用的文献

1
Adaptive Context-Aware Multi-Modal Network for Depth Completion.用于深度补全的自适应上下文感知多模态网络
IEEE Trans Image Process. 2021;30:5264-5276. doi: 10.1109/TIP.2021.3079821. Epub 2021 May 31.
2
Learning Guided Convolutional Network for Depth Completion.用于深度补全的学习引导卷积网络。
IEEE Trans Image Process. 2021;30:1116-1129. doi: 10.1109/TIP.2020.3040528. Epub 2020 Dec 15.
3
Confidence Propagation through CNNs for Guided Sparse Depth Regression.通过卷积神经网络进行置信传播以实现引导式稀疏深度回归
IEEE Trans Pattern Anal Mach Intell. 2020 Oct;42(10):2423-2436. doi: 10.1109/TPAMI.2019.2929170. Epub 2019 Jul 17.
4
Depth reconstruction from sparse samples: representation, algorithm, and sampling.从稀疏样本中进行深度重建:表示、算法和采样。
IEEE Trans Image Process. 2015 Jun;24(6):1983-96. doi: 10.1109/TIP.2015.2409551. Epub 2015 Mar 6.