GFI-Net：用于单目深度估计的全局特征交互网络。

GFI-Net: Global Feature Interaction Network for Monocular Depth Estimation.

作者信息

Zhang Cong, Xu Ke, Ma Yanxin, Wan Jianwei

机构信息

College of Electronic Science and Technology, National University of Defense Technology, Changsha 410073, China.

出版信息

Entropy (Basel). 2023 Feb 26;25(3):421. doi: 10.3390/e25030421.

DOI:10.3390/e25030421

PMID:36981310

原文链接:https://pmc.ncbi.nlm.nih.gov/articles/PMC10047826/

Abstract

Monocular depth estimation techniques are used to recover the distance from the target to the camera plane in an image scene. However, there are still several problems, such as insufficient estimation accuracy, the inaccurate localization of details, and depth discontinuity in planes parallel to the camera plane. To solve these problems, we propose the Global Feature Interaction Network (GFI-Net), which aims to utilize geometric features, such as object locations and vanishing points, on a global scale. In order to capture the interactive information of the width, height, and channel of the feature graph and expand the global information in the network, we designed a global interactive attention mechanism. The global interactive attention mechanism reduces the loss of pixel information and improves the performance of depth estimation. Furthermore, the encoder uses the Transformer to reduce coding losses and improve the accuracy of depth estimation. Finally, a local-global feature fusion module is designed to improve the depth map's representation of detailed areas. The experimental results on the NYU-Depth-v2 dataset and the KITTI dataset showed that our model achieved state-of-the-art performance with full detail recovery and depth continuation on the same plane.

摘要

单目深度估计技术用于在图像场景中恢复目标到相机平面的距离。然而，仍然存在一些问题，如估计精度不足、细节定位不准确以及与相机平面平行的平面中的深度不连续。为了解决这些问题，我们提出了全局特征交互网络（GFI-Net），其目的是在全局尺度上利用诸如物体位置和消失点等几何特征。为了捕捉特征图的宽度、高度和通道的交互信息并在网络中扩展全局信息，我们设计了一种全局交互注意力机制。全局交互注意力机制减少了像素信息的损失并提高了深度估计的性能。此外，编码器使用Transformer来减少编码损失并提高深度估计的准确性。最后，设计了一个局部-全局特征融合模块来改善深度图对细节区域的表示。在NYU-Depth-v2数据集和KITTI数据集上的实验结果表明，我们的模型在同一平面上实现了具有完整细节恢复和深度连续性的领先性能。