基于坐标注意力门控循环单元融合事件与帧用于单目深度估计

Fusing Events and Frames with Coordinate Attention Gated Recurrent Unit for Monocular Depth Estimation.

作者信息

Duan Huimei, Guo Chenggang, Ou Yuan

机构信息

School of Computer and Software Engineering, Xihua University, Chengdu 610039, China.

出版信息

Sensors (Basel). 2024 Dec 4;24(23):7752. doi: 10.3390/s24237752.

DOI:10.3390/s24237752

PMID:39686289

原文链接:https://pmc.ncbi.nlm.nih.gov/articles/PMC11645081/

Abstract

Monocular depth estimation is a central problem in computer vision and robot vision, aiming at obtaining the depth information of a scene from a single image. In some extreme environments such as dynamics or drastic lighting changes, monocular depth estimation methods based on conventional cameras often perform poorly. Event cameras are able to capture brightness changes asynchronously but are not able to acquire color and absolute brightness information. Thus, it is an ideal choice to make full use of the complementary advantages of event cameras and conventional cameras. However, how to effectively fuse event data and frames to improve the accuracy and robustness of monocular depth estimation remains an urgent problem. To overcome these challenges, a novel Coordinate Attention Gated Recurrent Unit (CAGRU) is proposed in this paper. Unlike the conventional ConvGRUs, our CAGRU abandons the conventional practice of using convolutional layers for all the gates and innovatively designs the coordinate attention as an attention gate and combines it with the convolutional gate. Coordinate attention explicitly models inter-channel dependencies and coordinate information in space. The coordinate attention gate in conjunction with the convolutional gate enable the network to model feature information spatially, temporally, and internally across channels. Based on this, the CAGRU can enhance the information density of the sparse events in the spatial domain in the recursive process of temporal information, thereby achieving more effective feature screening and fusion. It can effectively integrate feature information from event cameras and standard cameras, further improving the accuracy and robustness of monocular depth estimation. The experimental results show that the method proposed in this paper achieves significant performance improvements on different public datasets.

摘要

单目深度估计是计算机视觉和机器人视觉中的核心问题，旨在从单张图像中获取场景的深度信息。在一些极端环境中，如动态场景或光照剧烈变化的情况下，基于传统相机的单目深度估计方法往往表现不佳。事件相机能够异步捕捉亮度变化，但无法获取颜色和绝对亮度信息。因此，充分利用事件相机和传统相机的互补优势是一个理想的选择。然而，如何有效地融合事件数据和帧以提高单目深度估计的准确性和鲁棒性仍然是一个亟待解决的问题。为了克服这些挑战，本文提出了一种新颖的坐标注意力门控循环单元（CAGRU）。与传统的卷积门控循环单元不同，我们的CAGRU摒弃了对所有门都使用卷积层的传统做法，创新性地将坐标注意力设计为一个注意力门，并将其与卷积门相结合。坐标注意力显式地对通道间的依赖关系和空间中的坐标信息进行建模。坐标注意力门与卷积门相结合，使网络能够在空间、时间和跨通道内部对特征信息进行建模。基于此，CAGRU可以在时间信息的递归过程中增强空间域中稀疏事件的信息密度，从而实现更有效的特征筛选和融合。它能够有效地整合来自事件相机和标准相机的特征信息，进一步提高单目深度估计的准确性和鲁棒性。实验结果表明，本文提出的方法在不同的公共数据集上取得了显著的性能提升。

Suppr 超能文献

文献检索

文件翻译

深度研究

Suppr 超能文献

文献检索

文件翻译

深度研究

基于坐标注意力门控循环单元融合事件与帧用于单目深度估计

Fusing Events and Frames with Coordinate Attention Gated Recurrent Unit for Monocular Depth Estimation.

作者信息

机构信息

出版信息

相似文献

本文引用的文献

基于坐标注意力门控循环单元融合事件与帧用于单目深度估计

Fusing Events and Frames with Coordinate Attention Gated Recurrent Unit for Monocular Depth Estimation.

作者信息

机构信息

出版信息

相似文献

本文引用的文献