IEEE Trans Pattern Anal Mach Intell. 2022 Jul;44(7):3436-3449. doi: 10.1109/TPAMI.2021.3054886. Epub 2022 Jun 3.
Dynamic vision sensors (event cameras) have recently been introduced to solve a number of different vision tasks such as object recognition, activities recognition, tracking, etc. Compared with the traditional RGB sensors, the event cameras have many unique advantages such as ultra low resources consumption, high temporal resolution and much larger dynamic range. However, these cameras only produce noisy and asynchronous events of intensity changes, i.e., event-streams rather than frames, where conventional computer vision algorithms can't be directly applied. In our opinion the key challenge for improving the performance of event cameras in vision tasks is finding the appropriate representations of the event-streams so that cutting-edge learning approaches can be applied to fully uncover the spatio-temporal information contained in the event-streams. In this paper, we focus on the event-based human gait identification task and investigate the possible representations of the event-streams when deep neural networks are applied as the classifier. We propose new event-based gait recognition approaches basing on two different representations of the event-stream, i.e., graph and image-like representations, and use graph-based convolutional network (GCN) and convolutional neural networks (CNN) respectively to recognize gait from the event-streams. The two approaches are termed as EV-Gait-3DGraph and EV-Gait-IMG. To evaluate the performance of the proposed approaches, we collect two event-based gait datasets, one from real-world experiments and the other by converting the publicly available RGB gait recognition benchmark CASIA-B. Extensive experiments show that EV-Gait-3DGraph achieves significantly higher recognition accuracy than other competing methods when sufficient training samples are available. However, EV-Gait-IMG converges more quickly than graph-based approaches while training and shows good accuracy with only few number of training samples (less than ten). So image-like presentation is preferable when the amount of training data is limited.
动态视觉传感器(事件相机)最近被引入,以解决许多不同的视觉任务,如目标识别、活动识别、跟踪等。与传统的 RGB 传感器相比,事件相机具有许多独特的优势,如超低资源消耗、高时间分辨率和更大的动态范围。然而,这些相机只产生嘈杂和异步的强度变化事件,即事件流而不是帧,传统的计算机视觉算法不能直接应用。在我们看来,提高事件相机在视觉任务中的性能的关键挑战是找到事件流的适当表示,以便应用最新的学习方法来充分挖掘事件流中包含的时空信息。在本文中,我们专注于基于事件的人体步态识别任务,并研究了在应用深度学习方法作为分类器时,事件流的可能表示。我们提出了两种新的基于事件的步态识别方法,基于事件流的两种不同表示,即图和图像表示,并分别使用基于图的卷积网络(GCN)和卷积神经网络(CNN)来从事件流中识别步态。这两种方法分别称为 EV-Gait-3DGraph 和 EV-Gait-IMG。为了评估所提出方法的性能,我们收集了两个基于事件的步态数据集,一个来自真实实验,另一个通过转换公共可用的 RGB 步态识别基准 CASIA-B 得到。大量实验表明,在有足够训练样本的情况下,EV-Gait-3DGraph 比其他竞争方法具有更高的识别精度。然而,与基于图的方法相比,EV-Gait-IMG 在训练时收敛速度更快,并且在只有少量训练样本(少于十个)时也能表现出良好的精度。因此,当训练数据量有限时,图像表示更可取。