Department of Electrical and Computer Engineering, University of Texas at Dallas, Richardson, TX 75080, USA.
Sensors (Basel). 2020 May 20;20(10):2905. doi: 10.3390/s20102905.
Existing public domain multi-modal datasets for human action recognition only include actions of interest that have already been segmented from action streams. These datasets cannot be used to study a more realistic action recognition scenario where actions of interest occur randomly and continuously among actions of non-interest or no actions. It is more challenging to recognize actions of interest in continuous action streams since the starts and ends of these actions are not known and need to be determined in an on-the-fly manner. Furthermore, there exists no public domain multi-modal dataset in which video and inertial data are captured simultaneously for continuous action streams. The main objective of this paper is to describe a dataset that is collected and made publicly available, named Continuous Multimodal Human Action Dataset (C-MHAD), in which video and inertial data stream are captured simultaneously in a continuous way. This dataset is then used in an example recognition technique and the results obtained indicate that the fusion of these two sensing modalities increases the F1 scores compared to using each sensing modality individually.
现有的公共领域多模态人体动作识别数据集仅包括已从动作流中分割出的感兴趣动作。这些数据集不能用于研究更现实的动作识别场景,在该场景中,感兴趣的动作随机且连续地出现在非感兴趣动作或无动作中。由于这些动作的开始和结束未知,需要以即时而灵活的方式确定,因此在连续动作流中识别感兴趣的动作更具挑战性。此外,目前还没有公共领域的多模态数据集同时捕获视频和惯性数据用于连续动作流。本文的主要目的是描述一个已收集并公开发布的数据集,称为连续多模态人体动作数据集(C-MHAD),其中以连续的方式同时捕获视频和惯性数据流。然后,该数据集用于一个示例识别技术,结果表明,与单独使用每个传感模态相比,这两种传感模态的融合提高了 F1 分数。