Zhang Xiaoli, Nie Jialei, Wei Shoulin, Zhu Guifu, Dai Wei, Yang Can
Key Laboratory of Computer Science, Kunming University of Science and Technology, Kunming 650500, China.
School of Information Engineering and Automation, Kunming University of Science and Technology, Kunming 650500, China.
Sensors (Basel). 2024 Aug 30;24(17):5640. doi: 10.3390/s24175640.
With the development of educational technology, machine learning and deep learning provide technical support for traditional classroom observation assessment. However, in real classroom scenarios, the technique faces challenges such as lack of clarity of raw images, complexity of datasets, multi-target detection errors, and complexity of character interactions. Based on the above problems, a student classroom behavior recognition network incorporating super-resolution and target detection is proposed. To cope with the problem of unclear original images in the classroom scenario, SRGAN (Super Resolution Generative Adversarial Network for Images) is used to improve the image resolution and thus the recognition accuracy. To address the dataset complexity and multi-targeting problems, feature extraction is optimized, and multi-scale feature recognition is enhanced by introducing AKConv and LASK attention mechanisms into the Backbone module of the YOLOv8s algorithm. To improve the character interaction complexity problem, the CBAM attention mechanism is integrated to enhance the recognition of important feature channels and spatial regions. Experiments show that it can detect six behaviors of students-raising their hands, reading, writing, playing on their cell phones, looking down, and leaning on the table-in high-definition images. And the accuracy and robustness of this network is verified. Compared with small-object detection algorithms such as Faster R-CNN, YOLOv5, and YOLOv8s, this network demonstrates good detection performance on low-resolution small objects, complex datasets with numerous targets, occlusion, and overlapping students.
随着教育技术的发展,机器学习和深度学习为传统课堂观察评估提供了技术支持。然而,在真实的课堂场景中,该技术面临着原始图像清晰度不足、数据集复杂、多目标检测错误以及人物交互复杂等挑战。基于上述问题,提出了一种融合超分辨率和目标检测的学生课堂行为识别网络。为解决课堂场景中原始图像不清晰的问题,使用SRGAN(图像超分辨率生成对抗网络)来提高图像分辨率,进而提高识别准确率。为应对数据集复杂和多目标问题,对特征提取进行了优化,并通过在YOLOv8s算法的骨干模块中引入AKConv和LASK注意力机制来增强多尺度特征识别。为改善人物交互复杂问题,集成了CBAM注意力机制以增强对重要特征通道和空间区域的识别。实验表明,它能够在高清图像中检测出学生举手、阅读、书写、玩手机、低头和趴在桌子上这六种行为。并且验证了该网络的准确性和鲁棒性。与Faster R-CNN、YOLOv5和YOLOv8s等小目标检测算法相比,该网络在低分辨率小目标、具有大量目标的复杂数据集、遮挡以及学生重叠的情况下展现出良好的检测性能。