Shou Zhaoyu, Yuan Xiaohu, Li Dongxu, Mo Jianwen, Zhang Huibing, Zhang Jingwei, Wu Ziyong
School of Information and Communication, Guilin University of Electronic Technology, Guilin 541004, China.
Guangxi Wireless Broadband Communication and Signal Processing Key Laboratory, Guilin University of Electronic Technology, Guilin 541004, China.
Sensors (Basel). 2024 Aug 20;24(16):5371. doi: 10.3390/s24165371.
The precise recognition of entire classroom meta-actions is a crucial challenge for the tailored adaptive interpretation of student behavior, given the intricacy of these actions. This paper proposes a Dynamic Position Embedding-based Model for Student Classroom Complete Meta-Action Recognition (DPE-SAR) based on the Video Swin Transformer. The model utilizes a dynamic positional embedding technique to perform conditional positional encoding. Additionally, it incorporates a deep convolutional network to improve the parsing ability of the spatial structure of meta-actions. The full attention mechanism of ViT3D is used to extract the potential spatial features of actions and capture the global spatial-temporal information of meta-actions. The proposed model exhibits exceptional performance compared to baseline models in action recognition as observed in evaluations on public datasets and smart classroom meta-action recognition datasets. The experimental results confirm the superiority of the model in meta-action recognition.
鉴于课堂元动作的复杂性,对其进行精确识别是对学生行为进行定制化自适应解释的一项关键挑战。本文基于视频Swin Transformer提出了一种用于学生课堂完整元动作识别的基于动态位置嵌入的模型(DPE-SAR)。该模型利用动态位置嵌入技术进行条件位置编码。此外,它还结合了深度卷积网络来提高元动作空间结构的解析能力。使用ViT3D的全注意力机制来提取动作的潜在空间特征并捕捉元动作的全局时空信息。在公共数据集和智能课堂元动作识别数据集上的评估表明,与基线模型相比,所提出的模型在动作识别方面表现出色。实验结果证实了该模型在元动作识别方面的优越性。