Zhou Liuchen, Liu Xiangpeng, Guan Xiqiang, Cheng Yuhua
College of Information, Mechanical and Electrical Engineering, Shanghai Normal University, Shanghai 201418, China.
Shanghai Research Institute of Microelectronics, Peking University, Shanghai 201203, China.
Sensors (Basel). 2025 May 15;25(10):3132. doi: 10.3390/s25103132.
Under a student-centered educational paradigm, project-based learning (PBL) assessment requires accurate identification of classroom behaviors to facilitate effective teaching evaluations and the implementation of personalized learning strategies. The increasing use of visual and multi-modal sensors in smart classrooms has made it possible to continuously capture rich behavioral data. However, challenges such as lighting variations, occlusions, and diverse behaviors complicate sensor-based behavior analysis. To address these issues, we introduce CSSA-YOLO, a novel detection network that incorporates cross-scale feature optimization. First, we establish a C2fs module that captures spatiotemporal dependencies in small-scale actions such as hand-raising through hierarchical window attention. Second, a Shuffle Attention mechanism is then integrated into the neck to suppress interference from complex backgrounds, thereby enhancing the model's ability to focus on relevant features. Finally, to further enhance the network's ability to detect small targets and complex boundary behaviors, we utilize the WIoU loss function, which dynamically weights gradients to optimize the localization accuracy of occluded targets. Experiments involving the SCB03-S dataset showed that CSSA-YOLO outperforms traditional methods, achieving an of 76.0%, surpassing YOLOv8m by 1.2%, particularly in complex background and occlusion scenarios. Furthermore, it reaches 78.31 FPS, meeting the requirements for real-time application. This study offers a reliable solution for precise behavior recognition in classroom settings, supporting the development of intelligent education systems.
在以学生为中心的教育范式下,基于项目的学习(PBL)评估需要准确识别课堂行为,以促进有效的教学评估和个性化学习策略的实施。智能教室中视觉和多模态传感器的使用日益增加,使得持续捕捉丰富的行为数据成为可能。然而,诸如光照变化、遮挡和多样行为等挑战使基于传感器的行为分析变得复杂。为了解决这些问题,我们引入了CSSA-YOLO,一种结合了跨尺度特征优化的新型检测网络。首先,我们建立了一个C2fs模块,通过分层窗口注意力捕捉诸如举手等小规模动作中的时空依赖性。其次,将洗牌注意力机制集成到颈部,以抑制复杂背景的干扰,从而增强模型关注相关特征的能力。最后,为了进一步提高网络检测小目标和复杂边界行为的能力,我们使用WIoU损失函数,该函数动态加权梯度以优化被遮挡目标的定位精度。涉及SCB03-S数据集的实验表明,CSSA-YOLO优于传统方法,准确率达到76.0%,比YOLOv8m高出1.2%,特别是在复杂背景和遮挡场景中。此外,它达到了78.31 FPS,满足实时应用的要求。本研究为课堂环境中的精确行为识别提供了可靠的解决方案,支持智能教育系统的发展。