使用替代骨骼和视频预处理的可分离时空注意力提高动作识别

Improved Action Recognition with Separable Spatio-Temporal Attention Using Alternative Skeletal and Video Pre-Processing.

机构信息

Department of Computing Technology, University of Alicante, P.O. Box 99, E-03080 Alicante, Spain.

出版信息

Sensors (Basel). 2021 Feb 2;21(3):1005. doi: 10.3390/s21031005.

DOI:10.3390/s21031005

PMID:33540809

原文链接:https://pmc.ncbi.nlm.nih.gov/articles/PMC7867344/

Abstract

The potential benefits of recognising activities of daily living from video for active and assisted living have yet to be fully untapped. These technologies can be used for behaviour understanding, and lifelogging for caregivers and end users alike. The recent publication of realistic datasets for this purpose, such as the Toyota Smarthomes dataset, calls for pushing forward the efforts to improve action recognition. Using the separable spatio-temporal attention network proposed in the literature, this paper introduces a view-invariant normalisation of skeletal pose data and full activity crops for RGB data, which improve the baseline results by 9.5% (on the cross-subject experiments), outperforming state-of-the-art techniques in this field when using the original unmodified skeletal data in dataset. Our code and data are available online.

摘要

从视频中识别日常生活活动对于主动和辅助生活的潜在好处尚未得到充分挖掘。这些技术可用于行为理解，以及护理人员和最终用户的生活记录。最近发布了一些针对这一目的的现实数据集，如丰田智能家居数据集，这要求我们努力提高动作识别的效果。本文使用文献中提出的可分离时空注意网络，引入了骨骼姿势数据的视图不变归一化和 RGB 数据的完整活动裁剪，在使用原始未修改骨骼数据的情况下，将跨主体实验的基准结果提高了 9.5%，优于该领域的最新技术。我们的代码和数据可在线获取。