Gao Yue, Lu Jiaxuan, Li Siqi, Ma Nan, Du Shaoyi, Li Yipeng, Dai Qionghai
IEEE Trans Pattern Anal Mach Intell. 2023 Dec;45(12):14081-14097. doi: 10.1109/TPAMI.2023.3300741. Epub 2023 Nov 3.
Recent years have witnessed remarkable achievements in video-based action recognition. Apart from traditional frame-based cameras, event cameras are bio-inspired vision sensors that only record pixel-wise brightness changes rather than the brightness value. However, little effort has been made in event-based action recognition, and large-scale public datasets are also nearly unavailable. In this paper, we propose an event-based action recognition framework called EV-ACT. The Learnable Multi-Fused Representation (LMFR) is first proposed to integrate multiple event information in a learnable manner. The LMFR with dual temporal granularity is fed into the event-based slow-fast network for the fusion of appearance and motion features. A spatial-temporal attention mechanism is introduced to further enhance the learning capability of action recognition. To prompt research in this direction, we have collected the largest event-based action recognition benchmark named THU-50 and the accompanying THU-50-CHL dataset under challenging environments, including a total of over 12,830 recordings from 50 action categories, which is over 4 times the size of the previous largest dataset. Experimental results show that our proposed framework could achieve improvements of over 14.5%, 7.6%, 11.2%, and 7.4% compared to previous works on four benchmarks. We have also deployed our proposed EV-ACT framework on a mobile platform to validate its practicality and efficiency.
近年来,基于视频的动作识别取得了显著成就。除了传统的基于帧的相机外,事件相机是受生物启发的视觉传感器,它只记录逐像素的亮度变化而不是亮度值。然而,基于事件的动作识别方面的研究还很少,大规模的公共数据集也几乎没有。在本文中,我们提出了一个基于事件的动作识别框架,称为EV-ACT。首次提出了可学习的多融合表示(LMFR),以可学习的方式整合多个事件信息。将具有双重时间粒度的LMFR输入到基于事件的慢-快网络中,用于融合外观和运动特征。引入了时空注意力机制,以进一步增强动作识别的学习能力。为了推动这一方向的研究,我们收集了最大的基于事件的动作识别基准数据集THU-50以及具有挑战性环境下的配套THU-50-CHL数据集,其中包括来自50个动作类别的超过12,830条记录,这是之前最大数据集规模的4倍多。实验结果表明,与之前在四个基准上的工作相比,我们提出的框架可以实现超过14.5%、7.6%、11.2%和7.4%的提升。我们还将提出的EV-ACT框架部署在移动平台上,以验证其实用性和效率。