Suppr超能文献

基于通道和空间注意力的无监督单目深度估计

Unsupervised Monocular Depth Estimation With Channel and Spatial Attention.

作者信息

Wang Zhuping, Dai Xinke, Guo Zhanyu, Huang Chao, Zhang Hao

出版信息

IEEE Trans Neural Netw Learn Syst. 2024 Jun;35(6):7860-7870. doi: 10.1109/TNNLS.2022.3221416. Epub 2024 Jun 3.

Abstract

Understanding 3-D scene geometry from videos is a fundamental topic in visual perception. In this article, we propose an unsupervised monocular depth and camera motion estimation framework using unlabeled monocular videos to overcome the limitation of acquiring per-pixel ground-truth depth at scale. The photometric loss couples the depth network and pose network together and is essential to the unsupervised method, which is based on warping nearby views to target using the estimated depth and pose. We introduce the channelwise attention mechanism to dig into the relationship between channels and introduce the spatialwise attention mechanism to utilize the inner-spatial relationship of features. Both of them applied in depth networks can better activate the feature information between different convolutional layers and extract more discriminative features. In addition, we apply the Sobel boundary to our edge-aware smoothness for more reasonable accuracy, and clearer boundaries and structures. All of these help to close the gap with fully supervised methods and show high-quality state-of-the-art results on the KITTI benchmark and great generalization performance on the Make3D dataset.

摘要

从视频中理解三维场景几何是视觉感知中的一个基本课题。在本文中,我们提出了一个无监督的单目深度和相机运动估计框架,该框架使用未标记的单目视频,以克服大规模获取逐像素地面真值深度的局限性。光度损失将深度网络和姿态网络耦合在一起,对于基于使用估计的深度和姿态将附近视图扭曲到目标的无监督方法至关重要。我们引入通道注意力机制来深入研究通道之间的关系,并引入空间注意力机制来利用特征的内部空间关系。将它们两者应用于深度网络可以更好地激活不同卷积层之间的特征信息,并提取更具判别力的特征。此外,我们将Sobel边界应用于边缘感知平滑度,以获得更合理的精度、更清晰的边界和结构。所有这些都有助于缩小与完全监督方法的差距,并在KITTI基准测试中展示高质量的最新结果,以及在Make3D数据集上具有出色的泛化性能。

文献AI研究员

20分钟写一篇综述,助力文献阅读效率提升50倍。

立即体验

用中文搜PubMed

大模型驱动的PubMed中文搜索引擎

马上搜索

文档翻译

学术文献翻译模型,支持多种主流文档格式。

立即体验