Šajina Romeo, Ivašić-Kos Marina
Faculty of Informatics, University of Pula, 52100 Pula, Croatia.
Faculty of Informatics and Digital technologies, University of Rijeka, 51000 Rijeka, Croatia.
J Imaging. 2022 Nov 10;8(11):308. doi: 10.3390/jimaging8110308.
Player pose estimation is particularly important for sports because it provides more accurate monitoring of athlete movements and performance, recognition of player actions, analysis of techniques, and evaluation of action execution accuracy. All of these tasks are extremely demanding and challenging in sports that involve rapid movements of athletes with inconsistent speed and position changes, at varying distances from the camera with frequent occlusions, especially in team sports when there are more players on the field. A prerequisite for recognizing the player's actions on the video footage and comparing their poses during the execution of an action is the detection of the player's pose in each element of an action or technique. First, a 2D pose of the player is determined in each video frame, and converted into a 3D pose, then using the tracking method all the player poses are grouped into a sequence to construct a series of elements of a particular action. Considering that action recognition and comparison depend significantly on the accuracy of the methods used to estimate and track player pose in real-world conditions, the paper provides an overview and analysis of the methods that can be used for player pose estimation and tracking using a monocular camera, along with evaluation metrics on the example of handball scenarios. We have evaluated the applicability and robustness of 12 selected 2-stage deep learning methods for 3D pose estimation on a public and a custom dataset of handball jump shots for which they have not been trained and where never-before-seen poses may occur. Furthermore, this paper proposes methods for retargeting and smoothing the 3D sequence of poses that have experimentally shown a performance improvement for all tested models. Additionally, we evaluated the applicability and robustness of five state-of-the-art tracking methods on a public and a custom dataset of a handball training recorded with a monocular camera. The paper ends with a discussion apostrophizing the shortcomings of the pose estimation and tracking methods, reflected in the problems of locating key skeletal points and generating poses that do not follow possible human structures, which consequently reduces the overall accuracy of action recognition.
运动员姿态估计在体育领域尤为重要,因为它能更精确地监测运动员的动作和表现,识别运动员的动作,分析技术动作,并评估动作执行的准确性。在涉及运动员快速移动、速度和位置变化不一致、与摄像机距离不同且频繁遮挡的体育项目中,尤其是在团队运动中场上有更多运动员时,所有这些任务都极具挑战性。要识别视频画面中运动员的动作并在动作执行过程中比较他们的姿态,前提是要在动作或技术的每个环节检测出运动员的姿态。首先,在每个视频帧中确定运动员的二维姿态,然后将其转换为三维姿态,接着使用跟踪方法将所有运动员的姿态分组为一个序列,以构建特定动作的一系列环节。鉴于动作识别和比较在很大程度上取决于在实际条件下用于估计和跟踪运动员姿态的方法的准确性,本文概述并分析了可用于使用单目摄像机进行运动员姿态估计和跟踪的方法,以及在手球场景示例中的评估指标。我们在一个公开的和一个自定义的手球跳投数据集上评估了12种选定的用于三维姿态估计的两阶段深度学习方法的适用性和鲁棒性,这些数据集它们未曾在其上训练过,且可能会出现前所未见的姿态。此外,本文还提出了用于重新定位和平滑姿态三维序列的方法,实验表明这些方法对所有测试模型的性能都有提升。此外,我们在一个公开的和一个自定义的用单目摄像机录制的手球训练数据集上评估了五种最先进的跟踪方法的适用性和鲁棒性。本文最后讨论了姿态估计和跟踪方法的缺点,这些缺点体现在定位关键骨骼点的问题以及生成不符合可能人体结构的姿态上,从而降低了动作识别的整体准确性。