基于方向梯度直方图的动作视频序列中人体动作识别特征融合直方图

Histogram of Oriented Gradient-Based Fusion of Features for Human Action Recognition in Action Video Sequences.

作者信息

Patel Chirag I, Labana Dileep, Pandya Sharnil, Modi Kirit, Ghayvat Hemant, Awais Muhammad

机构信息

Computer Science & Engineering, Parul Institute of Technology, Parul University, Vadodara 391760, India.

Symbiosis Centre for Applied Artificial Intelligence and Symbiosis Institute of Technology, Symbiosis International (Deemed) University, Pune 412115, India.

出版信息

Sensors (Basel). 2020 Dec 18;20(24):7299. doi: 10.3390/s20247299.

DOI:10.3390/s20247299

PMID:33353248

原文链接:https://pmc.ncbi.nlm.nih.gov/articles/PMC7766717/

Abstract

Human Action Recognition (HAR) is the classification of an action performed by a human. The goal of this study was to recognize human actions in action video sequences. We present a novel feature descriptor for HAR that involves multiple features and combining them using fusion technique. The major focus of the feature descriptor is to exploits the action dissimilarities. The key contribution of the proposed approach is to built robust features descriptor that can work for underlying video sequences and various classification models. To achieve the objective of the proposed work, HAR has been performed in the following manner. First, moving object detection and segmentation are performed from the background. The features are calculated using the histogram of oriented gradient (HOG) from a segmented moving object. To reduce the feature descriptor size, we take an averaging of the HOG features across non-overlapping video frames. For the frequency domain information we have calculated regional features from the Fourier hog. Moreover, we have also included the velocity and displacement of moving object. Finally, we use fusion technique to combine these features in the proposed work. After a feature descriptor is prepared, it is provided to the classifier. Here, we have used well-known classifiers such as artificial neural networks (ANNs), support vector machine (SVM), multiple kernel learning (MKL), Meta-cognitive Neural Network (McNN), and the late fusion methods. The main objective of the proposed approach is to prepare a robust feature descriptor and to show the diversity of our feature descriptor. Though we are using five different classifiers, our feature descriptor performs relatively well across the various classifiers. The proposed approach is performed and compared with the state-of-the-art methods for action recognition on two publicly available benchmark datasets (KTH and Weizmann) and for cross-validation on the UCF11 dataset, HMDB51 dataset, and UCF101 dataset. Results of the control experiments, such as a change in the SVM classifier and the effects of the second hidden layer in ANN, are also reported. The results demonstrate that the proposed method performs reasonably compared with the majority of existing state-of-the-art methods, including the convolutional neural network-based feature extractors.

摘要

人类行为识别（HAR）是对人类执行的动作进行分类。本研究的目标是在动作视频序列中识别人类行为。我们提出了一种用于HAR的新型特征描述符，它涉及多个特征并使用融合技术将它们组合起来。该特征描述符的主要重点是利用动作的差异。所提出方法的关键贡献在于构建能够适用于底层视频序列和各种分类模型的强大特征描述符。为了实现所提出工作的目标，HAR按以下方式进行。首先，从背景中进行运动物体检测和分割。使用从分割后的运动物体提取的方向梯度直方图（HOG）来计算特征。为了减小特征描述符的大小，我们对非重叠视频帧上的HOG特征进行平均。对于频域信息，我们从傅里叶HOG计算区域特征。此外，我们还纳入了运动物体的速度和位移。最后，我们在所提出的工作中使用融合技术来组合这些特征。在准备好特征描述符后，将其提供给分类器。在这里，我们使用了诸如人工神经网络（ANN）、支持向量机（SVM）、多核学习（MKL）、元认知神经网络（McNN）等知名分类器以及后期融合方法。所提出方法的主要目标是准备一个强大的特征描述符，并展示我们特征描述符的多样性。尽管我们使用了五种不同的分类器，但我们的特征描述符在各种分类器上的表现都相对较好。在所提出的方法在两个公开可用的基准数据集（KTH和魏茨曼）上进行动作识别，并在UCF11数据集、HMDB51数据集和UCF101数据集上进行交叉验证，并与当前最先进的方法进行比较。还报告了控制实验的结果，例如SVM分类器的变化以及ANN中第二个隐藏层的影响。结果表明，与大多数现有的当前最先进方法（包括基于卷积神经网络的特征提取器）相比，所提出的方法表现合理。