Suppr超能文献

基于K近邻算法的机器学习分类器在深度学习的空间运动特征用于人体动作识别中的应用

KNN-Based Machine Learning Classifier Used on Deep Learned Spatial Motion Features for Human Action Recognition.

作者信息

Paramasivam Kalaivani, Sindha Mohamed Mansoor Roomi, Balakrishnan Sathya Bama

机构信息

Department of Electronics and Communication Engineering, Government College of Engineering, Bodinayakanur 625582, Tamilnadu, India.

Department of Electronics and Communication Engineering, Thiagarajar College of Engineering, Madurai 625015, Tamilnadu, India.

出版信息

Entropy (Basel). 2023 May 25;25(6):844. doi: 10.3390/e25060844.

Abstract

Human action recognition is an essential process in surveillance video analysis, which is used to understand the behavior of people to ensure safety. Most of the existing methods for HAR use computationally heavy networks such as 3D CNN and two-stream networks. To alleviate the challenges in the implementation and training of 3D deep learning networks, which have more parameters, a customized lightweight directed acyclic graph-based residual 2D CNN with fewer parameters was designed from scratch and named HARNet. A novel pipeline for the construction of spatial motion data from raw video input is presented for the latent representation learning of human actions. The constructed input is fed to the network for simultaneous operation over spatial and motion information in a single stream, and the latent representation learned at the fully connected layer is extracted and fed to the conventional machine learning classifiers for action recognition. The proposed work was empirically verified, and the experimental results were compared with those for existing methods. The results show that the proposed method outperforms state-of-the-art (SOTA) methods with a percentage improvement of 2.75% on UCF101, 10.94% on HMDB51, and 0.18% on the KTH dataset.

摘要

人体动作识别是监控视频分析中的一个重要过程,用于了解人们的行为以确保安全。现有的大多数人体动作识别方法都使用计算量较大的网络,如3D卷积神经网络(3D CNN)和双流网络。为了缓解具有更多参数的3D深度学习网络在实现和训练方面的挑战,我们从头开始设计了一种定制的、基于轻量级有向无环图的残差2D CNN,其参数较少,并将其命名为HARNet。我们提出了一种从原始视频输入构建空间运动数据的新颖管道,用于人体动作的潜在表示学习。将构建好的输入馈送到网络中,以便在单个流中对空间和运动信息进行同步操作,并提取在全连接层学到的潜在表示,将其馈送到传统机器学习分类器中进行动作识别。我们对所提出的工作进行了实证验证,并将实验结果与现有方法的结果进行了比较。结果表明,所提出的方法优于当前的先进(SOTA)方法,在UCF101数据集上的准确率提高了2.75%,在HMDB51数据集上提高了10.94%,在KTH数据集上提高了0.18%。

https://cdn.ncbi.nlm.nih.gov/pmc/blobs/a2d4/10297237/22e6a5cc4beb/entropy-25-00844-g001.jpg

文献AI研究员

20分钟写一篇综述,助力文献阅读效率提升50倍。

立即体验

用中文搜PubMed

大模型驱动的PubMed中文搜索引擎

马上搜索

文档翻译

学术文献翻译模型,支持多种主流文档格式。

立即体验