利用最大激活特写和 ETRI-Activity3D LivingLab 数据集进行动作识别。

Action Recognition Using Close-Up of Maximum Activation and ETRI-Activity3D LivingLab Dataset.

机构信息

Department of Electrical and Electronic Engineering, Yonsei University, Seoul 03722, Korea.

Intelligent Robotics Research Division, Electronics and Telecommunications Research Institute, Daejeon 34129, Korea.

出版信息

Sensors (Basel). 2021 Oct 12;21(20):6774. doi: 10.3390/s21206774.

DOI:10.3390/s21206774

PMID:34695988

原文链接:https://pmc.ncbi.nlm.nih.gov/articles/PMC8539691/

Abstract

The development of action recognition models has shown great performance on various video datasets. Nevertheless, because there is no rich data on target actions in existing datasets, it is insufficient to perform action recognition applications required by industries. To satisfy this requirement, datasets composed of target actions with high availability have been created, but it is difficult to capture various characteristics in actual environments because video data are generated in a specific environment. In this paper, we introduce a new ETRI-Activity3D-LivingLab dataset, which provides action sequences in actual environments and helps to handle a network generalization issue due to the dataset shift. When the action recognition model is trained on the ETRI-Activity3D and KIST SynADL datasets and evaluated on the ETRI-Activity3D-LivingLab dataset, the performance can be severely degraded because the datasets were captured in different environments domains. To reduce this dataset shift between training and testing datasets, we propose a close-up of maximum activation, which magnifies the most activated part of a video input in detail. In addition, we present various experimental results and analysis that show the dataset shift and demonstrate the effectiveness of the proposed method.

摘要

动作识别模型的发展在各种视频数据集上表现出了优异的性能。然而，由于现有数据集中缺乏目标动作的丰富数据，因此不足以满足行业所需的动作识别应用。为了满足这一需求，已经创建了由可用性高的目标动作组成的数据集，但由于视频数据是在特定环境中生成的，因此很难捕捉到实际环境中的各种特征。在本文中，我们引入了一个新的 ETRI-Activity3D-LivingLab 数据集，它提供了实际环境中的动作序列，并有助于处理由于数据集偏移导致的网络泛化问题。当在 ETRI-Activity3D 和 KIST SynADL 数据集上训练动作识别模型，并在 ETRI-Activity3D-LivingLab 数据集上进行评估时，由于数据集是在不同的环境域中捕获的，因此性能可能会严重下降。为了减少训练数据和测试数据之间的这种数据集偏移，我们提出了最大激活度特写，它详细放大视频输入中最活跃的部分。此外，我们还提出了各种实验结果和分析，展示了数据集偏移，并证明了所提出方法的有效性。