Ghosh Suman, Gallego Guillermo
IEEE Trans Pattern Anal Mach Intell. 2025 Oct;47(10):9130-9149. doi: 10.1109/TPAMI.2025.3586559.
Stereopsis has widespread appeal in computer vision and robotics as it is the predominant way by which we perceive depth to navigate our 3D world. Event cameras are novel bio-inspired sensors that detect per-pixel brightness changes asynchronously, with very high temporal resolution and high dynamic range, enabling machine perception in high-speed motion and broad illumination conditions. The high temporal precision also benefits stereo matching, making disparity (depth) estimation a popular research area for event cameras ever since their inception. Over the last 30 years, the field has evolved rapidly, from low-latency, low-power circuit design to current deep learning (DL) approaches driven by the computer vision community. The bibliography is vast and difficult to navigate for non-experts due its highly interdisciplinary nature. Past surveys have addressed distinct aspects of this topic, in the context of applications, or focusing only on a specific class of techniques, but have overlooked stereo datasets. This survey provides a comprehensive overview, covering both instantaneous stereo and long-term methods suitable for simultaneous localization and mapping (SLAM), along with theoretical and empirical comparisons. It is the first to extensively review DL methods as well as stereo datasets, even providing practical suggestions for creating new benchmarks to advance the field. The main advantages and challenges faced by event-based stereo depth estimation are also discussed. Despite significant progress, challenges remain in achieving optimal performance in not only accuracy but also efficiency, a cornerstone of event-based computing. We identify several gaps and propose future research directions. We hope this survey inspires future research in depth estimation with event cameras and related topics, by serving as an accessible entry point for newcomers, as well as a practical guide for seasoned researchers in the community.
立体视觉在计算机视觉和机器人技术中具有广泛的吸引力,因为它是我们感知深度以在三维世界中导航的主要方式。事件相机是一种新型的受生物启发的传感器,它能以非常高的时间分辨率和高动态范围异步检测每个像素的亮度变化,从而实现高速运动和广泛光照条件下的机器感知。高时间精度也有利于立体匹配,自事件相机诞生以来,视差(深度)估计一直是其热门的研究领域。在过去的30年里,该领域发展迅速,从低延迟、低功耗的电路设计发展到如今由计算机视觉社区推动的深度学习方法。由于其高度跨学科的性质,文献非常多,非专业人士很难梳理。以往的综述涉及了该主题在应用背景下的不同方面,或者只关注特定类别的技术,但都忽略了立体视觉数据集。本综述提供了全面的概述,涵盖了适用于同时定位与地图构建(SLAM)的瞬时立体视觉和长期方法,以及理论和实证比较。它首次广泛回顾了深度学习方法以及立体视觉数据集,甚至为创建新的基准以推动该领域发展提供了实用建议。还讨论了基于事件的立体深度估计面临的主要优势和挑战。尽管取得了重大进展,但在不仅要实现精度最优,还要实现效率最优方面仍存在挑战,而效率是基于事件计算的基石。我们识别了几个差距并提出了未来的研究方向。我们希望本综述能激发未来对事件相机深度估计及相关主题的研究,为新手提供一个易于理解的切入点,同时也为该领域经验丰富的研究人员提供实用指南。