Yang Zhiyuan, Shen Yuanyuan, Shen Yanfei
School of Sport Engineering, Beijing Sport University, Beijing, China.
Front Comput Neurosci. 2024 Feb 19;18:1341234. doi: 10.3389/fncom.2024.1341234. eCollection 2024.
Gesture serves as a crucial means of communication between individuals and between humans and machines. In football matches, referees communicate judgment information through gestures. Due to the diversity and complexity of referees' gestures and interference factors, such as the players, spectators, and camera angles, automated football referee gesture recognition (FRGR) has become a challenging task. The existing methods based on visual sensors often cannot provide a satisfactory performance. To tackle FRGR problems, we develop a deep learning model based on YOLOv8s. Three improving and optimizing strategies are integrated to solve these problems. First, a Global Attention Mechanism (GAM) is employed to direct the model's attention to the hand gestures and minimize the background interference. Second, a P2 detection head structure is integrated into the YOLOv8s model to enhance the accuracy of detecting smaller objects at a distance. Third, a new loss function based on the Minimum Point Distance Intersection over Union (MPDIoU) is used to effectively utilize anchor boxes with the same shape, but different sizes. Finally, experiments are executed on a dataset of six hand gestures among 1,200 images. The proposed method was compared with seven different existing models and 10 different optimization models. The proposed method achieves a precision rate of 89.3%, a recall rate of 88.9%, a mAP@0.5 rate of 89.9%, and a mAP@0.5:0.95 rate of 77.3%. These rates are approximately 1.4%, 2.0%, 1.1%, and 5.4% better than those of the newest YOLOv8s, respectively. The proposed method has right prospect in automated gesture recognition for football matches.
手势是人与人之间以及人与机器之间交流的重要方式。在足球比赛中,裁判通过手势传达判罚信息。由于裁判手势的多样性和复杂性以及诸如球员、观众和摄像机角度等干扰因素,足球裁判手势自动识别(FRGR)已成为一项具有挑战性的任务。现有的基于视觉传感器的方法往往无法提供令人满意的性能。为了解决FRGR问题,我们开发了一种基于YOLOv8s的深度学习模型。集成了三种改进和优化策略来解决这些问题。首先,采用全局注意力机制(GAM)引导模型关注手势并最小化背景干扰。其次,将P2检测头结构集成到YOLOv8s模型中,以提高远距离检测较小物体的准确性。第三,使用基于最小点距离交并比(MPDIoU)的新损失函数来有效利用形状相同但大小不同的锚框。最后,在1200张图像中的六种手势数据集上进行实验。将所提出的方法与七种不同的现有模型和十种不同的优化模型进行比较。所提出的方法实现了89.3%的精确率、88.9%的召回率、89.9%的mAP@0.5率和77.3%的mAP@0.5:0.95率。这些比率分别比最新的YOLOv8s高出约1.4%、2.0%、1.1%和5.4%。所提出的方法在足球比赛自动手势识别方面具有良好的前景。