使用事件相机的动作识别与基准测试

Action Recognition and Benchmark Using Event Cameras.

作者信息

Gao Yue, Lu Jiaxuan, Li Siqi, Ma Nan, Du Shaoyi, Li Yipeng, Dai Qionghai

出版信息

IEEE Trans Pattern Anal Mach Intell. 2023 Dec;45(12):14081-14097. doi: 10.1109/TPAMI.2023.3300741. Epub 2023 Nov 3.

DOI:10.1109/TPAMI.2023.3300741

Abstract

Recent years have witnessed remarkable achievements in video-based action recognition. Apart from traditional frame-based cameras, event cameras are bio-inspired vision sensors that only record pixel-wise brightness changes rather than the brightness value. However, little effort has been made in event-based action recognition, and large-scale public datasets are also nearly unavailable. In this paper, we propose an event-based action recognition framework called EV-ACT. The Learnable Multi-Fused Representation (LMFR) is first proposed to integrate multiple event information in a learnable manner. The LMFR with dual temporal granularity is fed into the event-based slow-fast network for the fusion of appearance and motion features. A spatial-temporal attention mechanism is introduced to further enhance the learning capability of action recognition. To prompt research in this direction, we have collected the largest event-based action recognition benchmark named THU-50 and the accompanying THU-50-CHL dataset under challenging environments, including a total of over 12,830 recordings from 50 action categories, which is over 4 times the size of the previous largest dataset. Experimental results show that our proposed framework could achieve improvements of over 14.5%, 7.6%, 11.2%, and 7.4% compared to previous works on four benchmarks. We have also deployed our proposed EV-ACT framework on a mobile platform to validate its practicality and efficiency.

摘要

近年来，基于视频的动作识别取得了显著成就。除了传统的基于帧的相机外，事件相机是受生物启发的视觉传感器，它只记录逐像素的亮度变化而不是亮度值。然而，基于事件的动作识别方面的研究还很少，大规模的公共数据集也几乎没有。在本文中，我们提出了一个基于事件的动作识别框架，称为EV-ACT。首次提出了可学习的多融合表示（LMFR），以可学习的方式整合多个事件信息。将具有双重时间粒度的LMFR输入到基于事件的慢-快网络中，用于融合外观和运动特征。引入了时空注意力机制，以进一步增强动作识别的学习能力。为了推动这一方向的研究，我们收集了最大的基于事件的动作识别基准数据集THU-50以及具有挑战性环境下的配套THU-50-CHL数据集，其中包括来自50个动作类别的超过12,830条记录，这是之前最大数据集规模的4倍多。实验结果表明，与之前在四个基准上的工作相比，我们提出的框架可以实现超过14.5%、7.6%、11.2%和7.4%的提升。我们还将提出的EV-ACT框架部署在移动平台上，以验证其实用性和效率。

Suppr 超能文献

文献检索

文件翻译

深度研究

Suppr 超能文献

文献检索

文件翻译

深度研究

使用事件相机的动作识别与基准测试

Action Recognition and Benchmark Using Event Cameras.

作者信息

出版信息

相似文献

引用本文的文献

使用事件相机的动作识别与基准测试

Action Recognition and Benchmark Using Event Cameras.

作者信息

出版信息

相似文献

引用本文的文献