Robotics and Perception Group, University of Zurich, Zurich, Switzerland.
Nature. 2024 May;629(8014):1034-1040. doi: 10.1038/s41586-024-07409-w. Epub 2024 May 29.
The computer vision algorithms used currently in advanced driver assistance systems rely on image-based RGB cameras, leading to a critical bandwidth-latency trade-off for delivering safe driving experiences. To address this, event cameras have emerged as alternative vision sensors. Event cameras measure the changes in intensity asynchronously, offering high temporal resolution and sparsity, markedly reducing bandwidth and latency requirements. Despite these advantages, event-camera-based algorithms are either highly efficient but lag behind image-based ones in terms of accuracy or sacrifice the sparsity and efficiency of events to achieve comparable results. To overcome this, here we propose a hybrid event- and frame-based object detector that preserves the advantages of each modality and thus does not suffer from this trade-off. Our method exploits the high temporal resolution and sparsity of events and the rich but low temporal resolution information in standard images to generate efficient, high-rate object detections, reducing perceptual and computational latency. We show that the use of a 20 frames per second (fps) RGB camera plus an event camera can achieve the same latency as a 5,000-fps camera with the bandwidth of a 45-fps camera without compromising accuracy. Our approach paves the way for efficient and robust perception in edge-case scenarios by uncovering the potential of event cameras.
当前用于高级驾驶辅助系统的计算机视觉算法依赖于基于图像的 RGB 摄像机,这导致了提供安全驾驶体验的关键带宽-延迟权衡。为了解决这个问题,事件摄像机已经作为替代视觉传感器出现。事件摄像机异步测量强度变化,提供高时间分辨率和稀疏性,显著减少带宽和延迟要求。尽管有这些优势,但基于事件摄像机的算法要么在准确性方面高度高效但落后于基于图像的算法,要么牺牲事件的稀疏性和效率以获得可比的结果。为了克服这一点,我们在这里提出了一种混合基于事件和基于帧的目标探测器,它保留了每种模态的优势,因此不会受到这种权衡的影响。我们的方法利用事件的高时间分辨率和稀疏性以及标准图像中的丰富但低时间分辨率信息来生成高效、高帧率的目标检测,减少感知和计算延迟。我们表明,使用每秒 20 帧(fps)的 RGB 摄像机加一个事件摄像机可以实现与每秒 5000 帧摄像机相同的延迟,而带宽仅为每秒 45 帧摄像机,而不会牺牲准确性。我们的方法通过揭示事件摄像机的潜力,为边缘情况下的高效和稳健感知铺平了道路。