• 文献检索
  • 文档翻译
  • 深度研究
  • 学术资讯
  • Suppr Zotero 插件Zotero 插件
  • 邀请有礼
  • 套餐&价格
  • 历史记录
应用&插件
Suppr Zotero 插件Zotero 插件浏览器插件Mac 客户端Windows 客户端微信小程序
定价
高级版会员购买积分包购买API积分包
服务
文献检索文档翻译深度研究API 文档MCP 服务
关于我们
关于 Suppr公司介绍联系我们用户协议隐私条款
关注我们

Suppr 超能文献

核心技术专利:CN118964589B侵权必究
粤ICP备2023148730 号-1Suppr @ 2026

文献检索

告别复杂PubMed语法,用中文像聊天一样搜索,搜遍4000万医学文献。AI智能推荐,让科研检索更轻松。

立即免费搜索

文件翻译

保留排版,准确专业,支持PDF/Word/PPT等文件格式,支持 12+语言互译。

免费翻译文档

深度研究

AI帮你快速写综述,25分钟生成高质量综述,智能提取关键信息,辅助科研写作。

立即免费体验

InterAcT:一种基于关键点的通用轻量级变压器模型,用于识别航拍视频中的人类单独动作和互动。

InterAcT: A generic keypoints-based lightweight transformer model for recognition of human solo actions and interactions in aerial videos.

作者信息

Shah Mubashir, Nawaz Tahir, Nawaz Rab, Rashid Nasir, Ali Muhammad Osama

机构信息

Department of Mechatronics Engineering, College of Electrical and Mechanical Engineering, National University of Sciences and Technology, Islamabad, Pakistan.

Deep Learning Lab, School of Interdisciplinary Engineering and Science, National University of Sciences and Technology, Islamabad, Pakistan.

出版信息

PLoS One. 2025 May 14;20(5):e0323314. doi: 10.1371/journal.pone.0323314. eCollection 2025.

DOI:10.1371/journal.pone.0323314
PMID:40367248
原文链接:https://pmc.ncbi.nlm.nih.gov/articles/PMC12077785/
Abstract

Human action recognition forms an important part of several aerial security and surveillance applications. Indeed, numerous efforts have been made to solve the problem in an effective and efficient manner. Existing methods, however, are generally aimed to recognize either solo actions or interactions, thus restricting their use to specific scenarios. Additionally, the need remains to devise lightweight and computationally efficient models to make them deployable in real-world applications. To this end, this paper presents a generic lightweight and computationally efficient Transformer network-based model, referred to as InterAcT, that relies on extracted bodily keypoints using YOLO v8 to recognize human solo actions as well as interactions in aerial videos. It features a lightweight architecture with 0.0709M parameters and 0.0389G flops, distinguishing it from the AcT models. An extensive performance evaluation has been performed on two publicly available aerial datasets: Drone Action and UT-Interaction, comprising a total of 18 classes including both solo actions and interactions. The model is optimized and trained on 80% train set, 10% validation set and its performance is evaluated on 10% test set achieving highly encouraging performance on multiple benchmarks, outperforming several state-of-the-art methods. Our model, with an accuracy of 0.9923 outperforms the AcT models (micro: 0.9353, small: 0.9893, base: 0.9907, and large: 0.9558), 2P-GCN (0.9337), LSTM (0.9774), 3D-ResNet (0.9921), and 3D CNN (0.9920). It has the strength to recognize a large number of solo actions and two-person interaction classes both in aerial videos and footage from ground-level cameras (grayscale and RGB).

摘要

人体动作识别是多种空中安全与监控应用的重要组成部分。事实上,人们已经做出了许多努力来有效且高效地解决这个问题。然而,现有方法通常旨在识别单独动作或交互动作,因此其应用局限于特定场景。此外,仍需要设计轻量级且计算效率高的模型,以便能够在实际应用中部署。为此,本文提出了一种基于Transformer网络的通用轻量级且计算高效的模型,称为InterAcT,它利用YOLO v8提取的身体关键点来识别空中视频中的人体单独动作以及交互动作。它具有轻量级架构,参数为0.0709M,浮点运算次数为0.0389G,这使其有别于AcT模型。我们在两个公开可用的空中数据集上进行了广泛的性能评估:无人机动作数据集(Drone Action)和UT交互数据集(UT-Interaction),总共包含18类,包括单独动作和交互动作。该模型在80%的训练集、10%的验证集上进行优化和训练,并在10%的测试集上评估其性能,在多个基准测试中取得了令人鼓舞的性能,优于几种现有最先进的方法。我们的模型准确率为0.9923,优于AcT模型(微型:0.9353,小型:0.9893,基础型:0.9907,大型:0.9558)、2P-GCN(0.9337)、LSTM(0.9774)、3D-ResNet(0.9921)和3D CNN(0.9920)。它有能力识别空中视频以及地面摄像头(灰度和RGB)拍摄的画面中的大量单独动作和两人交互类别。

https://cdn.ncbi.nlm.nih.gov/pmc/blobs/04ab/12077785/51445fef8e7b/pone.0323314.g010.jpg
https://cdn.ncbi.nlm.nih.gov/pmc/blobs/04ab/12077785/e301cc1b33c6/pone.0323314.g001.jpg
https://cdn.ncbi.nlm.nih.gov/pmc/blobs/04ab/12077785/5f3b2d981a6e/pone.0323314.g002.jpg
https://cdn.ncbi.nlm.nih.gov/pmc/blobs/04ab/12077785/29450e2789f6/pone.0323314.g003.jpg
https://cdn.ncbi.nlm.nih.gov/pmc/blobs/04ab/12077785/6ea3c6132111/pone.0323314.g004.jpg
https://cdn.ncbi.nlm.nih.gov/pmc/blobs/04ab/12077785/5e57a93c01db/pone.0323314.g005.jpg
https://cdn.ncbi.nlm.nih.gov/pmc/blobs/04ab/12077785/d14ec72ca4d0/pone.0323314.g006.jpg
https://cdn.ncbi.nlm.nih.gov/pmc/blobs/04ab/12077785/17b83af60972/pone.0323314.g007.jpg
https://cdn.ncbi.nlm.nih.gov/pmc/blobs/04ab/12077785/4e5db2a2dff3/pone.0323314.g008.jpg
https://cdn.ncbi.nlm.nih.gov/pmc/blobs/04ab/12077785/5932d7c9f50a/pone.0323314.g009.jpg
https://cdn.ncbi.nlm.nih.gov/pmc/blobs/04ab/12077785/51445fef8e7b/pone.0323314.g010.jpg
https://cdn.ncbi.nlm.nih.gov/pmc/blobs/04ab/12077785/e301cc1b33c6/pone.0323314.g001.jpg
https://cdn.ncbi.nlm.nih.gov/pmc/blobs/04ab/12077785/5f3b2d981a6e/pone.0323314.g002.jpg
https://cdn.ncbi.nlm.nih.gov/pmc/blobs/04ab/12077785/29450e2789f6/pone.0323314.g003.jpg
https://cdn.ncbi.nlm.nih.gov/pmc/blobs/04ab/12077785/6ea3c6132111/pone.0323314.g004.jpg
https://cdn.ncbi.nlm.nih.gov/pmc/blobs/04ab/12077785/5e57a93c01db/pone.0323314.g005.jpg
https://cdn.ncbi.nlm.nih.gov/pmc/blobs/04ab/12077785/d14ec72ca4d0/pone.0323314.g006.jpg
https://cdn.ncbi.nlm.nih.gov/pmc/blobs/04ab/12077785/17b83af60972/pone.0323314.g007.jpg
https://cdn.ncbi.nlm.nih.gov/pmc/blobs/04ab/12077785/4e5db2a2dff3/pone.0323314.g008.jpg
https://cdn.ncbi.nlm.nih.gov/pmc/blobs/04ab/12077785/5932d7c9f50a/pone.0323314.g009.jpg
https://cdn.ncbi.nlm.nih.gov/pmc/blobs/04ab/12077785/51445fef8e7b/pone.0323314.g010.jpg

相似文献

1
InterAcT: A generic keypoints-based lightweight transformer model for recognition of human solo actions and interactions in aerial videos.InterAcT:一种基于关键点的通用轻量级变压器模型,用于识别航拍视频中的人类单独动作和互动。
PLoS One. 2025 May 14;20(5):e0323314. doi: 10.1371/journal.pone.0323314. eCollection 2025.
2
SVM directed machine learning classifier for human action recognition network.用于人体动作识别网络的支持向量机导向的机器学习分类器。
Sci Rep. 2025 Jan 3;15(1):672. doi: 10.1038/s41598-024-83529-7.
3
Two-Stream Modality-Based Deep Learning Approach for Enhanced Two-Person Human Interaction Recognition in Videos.基于双流模态的深度学习方法增强视频中的双人互动识别。
Sensors (Basel). 2024 Nov 3;24(21):7077. doi: 10.3390/s24217077.
4
A novel YOLO LSTM approach for enhanced human action recognition in video sequences.一种用于增强视频序列中人类动作识别的新型YOLO LSTM方法。
Sci Rep. 2025 May 16;15(1):17036. doi: 10.1038/s41598-025-01898-z.
5
Dark-DSAR: Lightweight one-step pipeline for action recognition in dark videos.Dark-DSAR:黑暗视频中动作识别的轻量级一站式流水线。
Neural Netw. 2024 Nov;179:106622. doi: 10.1016/j.neunet.2024.106622. Epub 2024 Aug 8.
6
CNN-LSTM Model for Recognizing Video-Recorded Actions Performed in a Traditional Chinese Exercise.用于识别传统中国功法中视频记录动作的 CNN-LSTM 模型。
IEEE J Transl Eng Health Med. 2023 Jun 2;11:351-359. doi: 10.1109/JTEHM.2023.3282245. eCollection 2023.
7
Real-Time Human Detection for Aerial Captured Video Sequences via Deep Models.基于深度学习模型的航空捕获视频序列中的实时人体检测。
Comput Intell Neurosci. 2018 Feb 12;2018:1639561. doi: 10.1155/2018/1639561. eCollection 2018.
8
Pilot Medical Certification飞行员医学认证
9
A discriminative multi-modal adaptation neural network model for video action recognition.一种用于视频动作识别的判别式多模态自适应神经网络模型。
Neural Netw. 2025 May;185:107114. doi: 10.1016/j.neunet.2024.107114. Epub 2025 Jan 3.
10
Human-Centric Transformer for Domain Adaptive Action Recognition.用于域自适应动作识别的以人为中心的Transformer
IEEE Trans Pattern Anal Mach Intell. 2025 Feb;47(2):679-696. doi: 10.1109/TPAMI.2024.3429387. Epub 2025 Jan 9.

本文引用的文献

1
AlphaPose: Whole-Body Regional Multi-Person Pose Estimation and Tracking in Real-Time.AlphaPose:实时的全身区域多人姿态估计和跟踪。
IEEE Trans Pattern Anal Mach Intell. 2023 Jun;45(6):7157-7173. doi: 10.1109/TPAMI.2022.3222784.
2
Human activity recognition using tools of convolutional neural networks: A state of the art review, data sets, challenges, and future prospects.使用卷积神经网络工具进行人类活动识别:最新研究综述、数据集、挑战与未来展望。
Comput Biol Med. 2022 Oct;149:106060. doi: 10.1016/j.compbiomed.2022.106060. Epub 2022 Sep 1.
3
Human Activity Recognition: Review, Taxonomy and Open Challenges.
人体活动识别:综述、分类与开放挑战。
Sensors (Basel). 2022 Aug 27;22(17):6463. doi: 10.3390/s22176463.
4
Human Action Recognition From Various Data Modalities: A Review.基于多种数据模态的人类行为识别综述
IEEE Trans Pattern Anal Mach Intell. 2023 Mar;45(3):3200-3225. doi: 10.1109/TPAMI.2022.3183112. Epub 2023 Feb 3.
5
AR3D: Attention Residual 3D Network for Human Action Recognition.AR3D:用于人体动作识别的注意力残差 3D 网络。
Sensors (Basel). 2021 Feb 28;21(5):1656. doi: 10.3390/s21051656.
6
Hierarchical Long Short-Term Concurrent Memory for Human Interaction Recognition.用于人类交互识别的层次长短时并发记忆
IEEE Trans Pattern Anal Mach Intell. 2021 Mar;43(3):1110-1118. doi: 10.1109/TPAMI.2019.2942030. Epub 2021 Feb 4.
7
OpenPose: Realtime Multi-Person 2D Pose Estimation Using Part Affinity Fields.OpenPose:基于部件亲和力字段的实时多人 2D 姿态估计。
IEEE Trans Pattern Anal Mach Intell. 2021 Jan;43(1):172-186. doi: 10.1109/TPAMI.2019.2929257. Epub 2020 Dec 4.
8
3D convolutional neural networks for human action recognition.三维卷积神经网络的人体动作识别。
IEEE Trans Pattern Anal Mach Intell. 2013 Jan;35(1):221-31. doi: 10.1109/TPAMI.2012.59.
9
An overview of statistical learning theory.统计学习理论概述。
IEEE Trans Neural Netw. 1999;10(5):988-99. doi: 10.1109/72.788640.