Duan Huimei, Guo Chenggang, Ou Yuan
School of Computer and Software Engineering, Xihua University, Chengdu 610039, China.
Sensors (Basel). 2024 Dec 4;24(23):7752. doi: 10.3390/s24237752.
Monocular depth estimation is a central problem in computer vision and robot vision, aiming at obtaining the depth information of a scene from a single image. In some extreme environments such as dynamics or drastic lighting changes, monocular depth estimation methods based on conventional cameras often perform poorly. Event cameras are able to capture brightness changes asynchronously but are not able to acquire color and absolute brightness information. Thus, it is an ideal choice to make full use of the complementary advantages of event cameras and conventional cameras. However, how to effectively fuse event data and frames to improve the accuracy and robustness of monocular depth estimation remains an urgent problem. To overcome these challenges, a novel Coordinate Attention Gated Recurrent Unit (CAGRU) is proposed in this paper. Unlike the conventional ConvGRUs, our CAGRU abandons the conventional practice of using convolutional layers for all the gates and innovatively designs the coordinate attention as an attention gate and combines it with the convolutional gate. Coordinate attention explicitly models inter-channel dependencies and coordinate information in space. The coordinate attention gate in conjunction with the convolutional gate enable the network to model feature information spatially, temporally, and internally across channels. Based on this, the CAGRU can enhance the information density of the sparse events in the spatial domain in the recursive process of temporal information, thereby achieving more effective feature screening and fusion. It can effectively integrate feature information from event cameras and standard cameras, further improving the accuracy and robustness of monocular depth estimation. The experimental results show that the method proposed in this paper achieves significant performance improvements on different public datasets.
单目深度估计是计算机视觉和机器人视觉中的核心问题,旨在从单张图像中获取场景的深度信息。在一些极端环境中,如动态场景或光照剧烈变化的情况下,基于传统相机的单目深度估计方法往往表现不佳。事件相机能够异步捕捉亮度变化,但无法获取颜色和绝对亮度信息。因此,充分利用事件相机和传统相机的互补优势是一个理想的选择。然而,如何有效地融合事件数据和帧以提高单目深度估计的准确性和鲁棒性仍然是一个亟待解决的问题。为了克服这些挑战,本文提出了一种新颖的坐标注意力门控循环单元(CAGRU)。与传统的卷积门控循环单元不同,我们的CAGRU摒弃了对所有门都使用卷积层的传统做法,创新性地将坐标注意力设计为一个注意力门,并将其与卷积门相结合。坐标注意力显式地对通道间的依赖关系和空间中的坐标信息进行建模。坐标注意力门与卷积门相结合,使网络能够在空间、时间和跨通道内部对特征信息进行建模。基于此,CAGRU可以在时间信息的递归过程中增强空间域中稀疏事件的信息密度,从而实现更有效的特征筛选和融合。它能够有效地整合来自事件相机和标准相机的特征信息,进一步提高单目深度估计的准确性和鲁棒性。实验结果表明,本文提出的方法在不同的公共数据集上取得了显著的性能提升。