IEEE Trans Pattern Anal Mach Intell. 2020 Oct;42(10):2684-2701. doi: 10.1109/TPAMI.2019.2916873. Epub 2019 May 14.
Research on depth-based human activity analysis achieved outstanding performance and demonstrated the effectiveness of 3D representation for action recognition. The existing depth-based and RGB+D-based action recognition benchmarks have a number of limitations, including the lack of large-scale training samples, realistic number of distinct class categories, diversity in camera views, varied environmental conditions, and variety of human subjects. In this work, we introduce a large-scale dataset for RGB+D human action recognition, which is collected from 106 distinct subjects and contains more than 114 thousand video samples and 8 million frames. This dataset contains 120 different action classes including daily, mutual, and health-related activities. We evaluate the performance of a series of existing 3D activity analysis methods on this dataset, and show the advantage of applying deep learning methods for 3D-based human action recognition. Furthermore, we investigate a novel one-shot 3D activity recognition problem on our dataset, and a simple yet effective Action-Part Semantic Relevance-aware (APSR) framework is proposed for this task, which yields promising results for recognition of the novel action classes. We believe the introduction of this large-scale dataset will enable the community to apply, adapt, and develop various data-hungry learning techniques for depth-based and RGB+D-based human activity understanding.
基于深度的人体活动分析研究取得了优异的性能,并证明了 3D 表示对于动作识别的有效性。现有的基于深度和 RGB+D 的动作识别基准存在许多局限性,包括缺乏大规模的训练样本、现实的不同类别数量、摄像机视角的多样性、不同的环境条件和不同的人体主体。在这项工作中,我们引入了一个用于 RGB+D 人体动作识别的大规模数据集,该数据集由 106 个不同的主体采集,包含超过 114000 个视频样本和 800 万帧。该数据集包含 120 个不同的动作类别,包括日常、相互和与健康相关的活动。我们评估了一系列现有的 3D 活动分析方法在该数据集上的性能,并展示了应用深度学习方法进行基于 3D 的人体动作识别的优势。此外,我们还研究了我们数据集上的一个新的单次 3D 活动识别问题,并提出了一种简单而有效的基于动作部分语义相关性感知的(APSR)框架来解决这个问题,该方法对新的动作类别的识别产生了有前景的结果。我们相信,这个大规模数据集的引入将使社区能够应用、适应和开发各种基于深度和 RGB+D 的人体活动理解的急需数据的学习技术。