Host Kristina, Pobar Miran, Ivasic-Kos Marina
Faculty of Informatics and Digital Technologies, University of Rijeka, 51000 Rijeka, Croatia.
Centre for Artificial Intelligence and Cybersecurity, University of Rijeka, 51000 Rijeka, Croatia.
J Imaging. 2023 Apr 13;9(4):80. doi: 10.3390/jimaging9040080.
This paper focuses on image and video content analysis of handball scenes and applying deep learning methods for detecting and tracking the players and recognizing their activities. Handball is a team sport of two teams played indoors with the ball with well-defined goals and rules. The game is dynamic, with fourteen players moving quickly throughout the field in different directions, changing positions and roles from defensive to offensive, and performing different techniques and actions. Such dynamic team sports present challenging and demanding scenarios for both the object detector and the tracking algorithms and other computer vision tasks, such as action recognition and localization, with much room for improvement of existing algorithms. The aim of the paper is to explore the computer vision-based solutions for recognizing player actions that can be applied in unconstrained handball scenes with no additional sensors and with modest requirements, allowing a broader adoption of computer vision applications in both professional and amateur settings. This paper presents semi-manual creation of custom handball action dataset based on automatic player detection and tracking, and models for handball action recognition and localization using Inflated 3D Networks (I3D). For the task of player and ball detection, different configurations of You Only Look Once (YOLO) and Mask Region-Based Convolutional Neural Network (Mask R-CNN) models fine-tuned on custom handball datasets are compared to original YOLOv7 model to select the best detector that will be used for tracking-by-detection algorithms. For the player tracking, DeepSORT and Bag of tricks for SORT (BoT SORT) algorithms with Mask R-CNN and YOLO detectors were tested and compared. For the task of action recognition, I3D multi-class model and ensemble of binary I3D models are trained with different input frame lengths and frame selection strategies, and the best solution is proposed for handball action recognition. The obtained action recognition models perform well on the test set with nine handball action classes, with average F1 measures of 0.69 and 0.75 for ensemble and multi-class classifiers, respectively. They can be used to index handball videos to facilitate retrieval automatically. Finally, some open issues, challenges in applying deep learning methods in such a dynamic sports environment, and direction for future development will be discussed.
本文聚焦于手球场景的图像和视频内容分析,并应用深度学习方法来检测和跟踪球员以及识别他们的活动。手球是一项两队在室内进行的团队运动,使用球并有着明确的目标和规则。比赛充满活力,十四名球员在整个场地内快速向不同方向移动,从防守到进攻不断变换位置和角色,并执行不同的技术和动作。这种动态的团队运动对目标检测器、跟踪算法以及其他计算机视觉任务(如动作识别和定位)而言,呈现出具有挑战性和高要求的场景,现有算法仍有很大的改进空间。本文的目的是探索基于计算机视觉的解决方案,以识别可应用于无额外传感器且要求不高的无约束手球场景中的球员动作,从而使计算机视觉应用在专业和业余场景中得到更广泛的应用。本文介绍了基于自动球员检测和跟踪的自定义手球动作数据集的半自动创建,以及使用膨胀3D网络(I3D)进行手球动作识别和定位的模型。对于球员和球的检测任务,将在自定义手球数据集上微调的不同配置的你只看一次(YOLO)和基于掩码区域的卷积神经网络(Mask R-CNN)模型与原始YOLOv7模型进行比较,以选择将用于检测跟踪算法的最佳检测器。对于球员跟踪,测试并比较了结合Mask R-CNN和YOLO检测器的深度SORT和SORT技巧包(BoT SORT)算法。对于动作识别任务,使用不同的输入帧长度和帧选择策略训练I3D多类模型和二元I3D模型的集成,并提出了用于手球动作识别的最佳解决方案。所获得的动作识别模型在具有九个手球动作类别的测试集上表现良好,集成分类器和多类分类器的平均F1度量分别为0.69和0.75。它们可用于对手球视频进行索引,以方便自动检索。最后,将讨论一些开放问题、在这种动态运动环境中应用深度学习方法的挑战以及未来的发展方向。