Wang Yuping, Zhang Junfei
School of Statistics, Capital University of Economics and Business, Beijing, China.
School of Statistics and Mathematics, Central University of Finance and Economics, Beijing, China.
PeerJ Comput Sci. 2022 Mar 16;8:e923. doi: 10.7717/peerj-cs.923. eCollection 2022.
It is a challenging problem to classify multi-dimensional data with complex intrinsic geometry inherent, such as human gesture recognition based on videos. In particular, manifold structure is a good way to characterize intrinsic geometry of multi-dimensional data. The recently proposed sparse coding on Grassmann manifold shows high discriminative power in many visual classification tasks. It represents videos on Grassmann manifold using Singular Value Decomposition (SVD) of the data matrix by vectorizing each image in videos, while vectorization destroys the spatial structure of videos. To keep the spatial structure of videos, they can be represented as the form of data tensor. In this paper, we firstly represent human gesture videos on product Grassmann manifold (PGM) by Higher Order Singular Value Decomposition (HOSVD) of data tensor. Each factor manifold characterizes features of human gesture video from different perspectives and can be understood as appearance, horizontal motion and vertical motion of human gesture video respectively. We then propose a weighted sparse coding model on PGM, where weights can be understood as modeling the importance of factor manifolds. Furthermore, we propose an optimization algorithm for learning coding coefficients by embedding each factor Grassmann manifold into symmetric matrices space. Finally, we give a classification algorithm, and experimental results on three public datasets show that our method is competitive to some relevant excellent methods.
对具有复杂内在几何结构的多维数据进行分类是一个具有挑战性的问题,例如基于视频的人体手势识别。特别是,流形结构是表征多维数据内在几何结构的一种好方法。最近提出的格拉斯曼流形上的稀疏编码在许多视觉分类任务中显示出很高的判别能力。它通过对视频中的每个图像进行向量化,利用数据矩阵的奇异值分解(SVD)在格拉斯曼流形上表示视频,而向量化破坏了视频的空间结构。为了保持视频的空间结构,可以将它们表示为数据张量的形式。在本文中,我们首先通过数据张量的高阶奇异值分解(HOSVD)在乘积格拉斯曼流形(PGM)上表示人体手势视频。每个因子流形从不同角度表征人体手势视频的特征,分别可以理解为人体手势视频的外观、水平运动和垂直运动。然后,我们在PGM上提出了一种加权稀疏编码模型,其中权重可以理解为对因子流形重要性的建模。此外,我们提出了一种通过将每个因子格拉斯曼流形嵌入对称矩阵空间来学习编码系数的优化算法。最后,我们给出了一种分类算法,在三个公共数据集上的实验结果表明,我们的方法与一些相关的优秀方法相比具有竞争力。