Makarov Ilya, Bakhanova Maria, Nikolenko Sergey, Gerasimova Olga
HSE University, Moscow, Russia.
Artificial Intelligence Research Institute (AIRI), Moscow, Russia.
PeerJ Comput Sci. 2022 Jan 31;8:e865. doi: 10.7717/peerj-cs.865. eCollection 2022.
Depth estimation has been an essential task for many computer vision applications, especially in autonomous driving, where safety is paramount. Depth can be estimated not only with traditional supervised learning but also via a self-supervised approach that relies on camera motion and does not require ground truth depth maps. Recently, major improvements have been introduced to make self-supervised depth prediction more precise. However, most existing approaches still focus on single-frame depth estimation, even in the self-supervised setting. Since most methods can operate with frame sequences, we believe that the quality of current models can be significantly improved with the help of information about previous frames. In this work, we study different ways of integrating recurrent blocks and attention mechanisms into a common self-supervised depth estimation pipeline. We propose a set of modifications that utilize temporal information from previous frames and provide new neural network architectures for monocular depth estimation in a self-supervised manner. Our experiments on the KITTI dataset show that proposed modifications can be an effective tool for exploiting temporal information in a depth prediction pipeline.
深度估计一直是许多计算机视觉应用中的一项重要任务,尤其是在自动驾驶领域,安全至关重要。深度不仅可以通过传统的监督学习来估计,还可以通过一种自监督方法来估计,该方法依赖于相机运动,并且不需要真实的深度图。最近,已经引入了重大改进,以使自监督深度预测更加精确。然而,即使在自监督设置中,大多数现有方法仍然专注于单帧深度估计。由于大多数方法可以处理帧序列,我们相信借助先前帧的信息可以显著提高当前模型的质量。在这项工作中,我们研究了将循环块和注意力机制集成到通用自监督深度估计管道中的不同方法。我们提出了一组修改,利用来自先前帧的时间信息,并以自监督的方式为单目深度估计提供新的神经网络架构。我们在KITTI数据集上的实验表明,所提出的修改可以成为在深度预测管道中利用时间信息的有效工具。