Shin Jungpil, Hassan Najmul, Miah Abu Saleh Musa, Nishimura Satoshi
School of Computer Science and Engineering, The University of Aizu, Aizuwakamatsu 965-8580, Japan.
Sensors (Basel). 2025 Jun 27;25(13):4028. doi: 10.3390/s25134028.
Human Activity Recognition (HAR) systems aim to understand human behavior and assign a label to each action, attracting significant attention in computer vision due to their wide range of applications. HAR can leverage various data modalities, such as RGB images and video, skeleton, depth, infrared, point cloud, event stream, audio, acceleration, and radar signals. Each modality provides unique and complementary information suited to different application scenarios. Consequently, numerous studies have investigated diverse approaches for HAR using these modalities. This survey includes only peer-reviewed research papers published in English to ensure linguistic consistency and academic integrity. This paper presents a comprehensive survey of the latest advancements in HAR from 2014 to 2025, focusing on Machine Learning (ML) and Deep Learning (DL) approaches categorized by input data modalities. We review both single-modality and multi-modality techniques, highlighting fusion-based and co-learning frameworks. Additionally, we cover advancements in hand-crafted action features, methods for recognizing human-object interactions, and activity detection. Our survey includes a detailed dataset description for each modality, as well as a summary of the latest HAR systems, accompanied by a mathematical derivation for evaluating the deep learning model for each modality, and it also provides comparative results on benchmark datasets. Finally, we provide insightful observations and propose effective future research directions in HAR.
人类活动识别(HAR)系统旨在理解人类行为并为每个动作赋予一个标签,由于其广泛的应用范围,在计算机视觉领域引起了广泛关注。HAR可以利用各种数据模态,如RGB图像和视频、骨骼、深度、红外、点云、事件流、音频、加速度和雷达信号。每种模态都提供了适合不同应用场景的独特且互补的信息。因此,许多研究探讨了使用这些模态进行HAR的各种方法。本次调查仅包括以英文发表的经过同行评审的研究论文,以确保语言一致性和学术完整性。本文对2014年至2025年HAR的最新进展进行了全面调查,重点关注按输入数据模态分类的机器学习(ML)和深度学习(DL)方法。我们回顾了单模态和多模态技术,突出了基于融合和协同学习的框架。此外,我们还涵盖了手工制作的动作特征、识别人与物体交互的方法以及活动检测方面的进展。我们的调查包括每种模态的详细数据集描述,以及最新HAR系统的总结,同时还给出了评估每种模态深度学习模型的数学推导,并在基准数据集上提供了比较结果。最后,我们给出了有见地的观察结果,并提出了HAR未来有效的研究方向。