Cabado Bruno, Guijarro-Berdiñas Bertha, Padrón Emilio J
Universidade da Coruña, CITIC Research Center, A Coruña 15071, Spain.
CINFO CONTENIDOS INFORMATIVOS PERSONALIZADOS SL, Ciudad de las TIC, A Coruña 15008, Spain.
Data Brief. 2024 Dec 27;58:111265. doi: 10.1016/j.dib.2024.111265. eCollection 2025 Feb.
This paper presents a synthetic dataset of labeled game situations in recordings of federated handball and basketball matches played in Galicia, Spain. The dataset consists of synthetic data generated from real video frames, including 308,805 labeled handball frames and 56,578 labeled basketball frames extracted from 2105 handball and 383 basketball 5-s video clips. Experts manually labeled the video clips based on the respective sports, while the individual frames were automatically labeled using computer vision and machine learning techniques. The dataset encompasses seven classes of game situations: left attack, left counterattack, left penalty, right attack, right counterattack, right penalty, and timeout. In basketball, the penalty class refers to the free throws attempted by players after they have been fouled by an opposing player. Each frame in the dataset is assigned to one of these classes, considering the game situation and specific context. Importantly, the dataset does not contain actual video frames; instead, it provides a synthetic, normalized representation of each frame in JSON format. This tabular data includes player, referee, and ball positions on a normalized field, player and referee velocities, and key regions on the court. Positions of players, referees, and the ball were automatically inferred in each frame by an object detector, followed by a tracking step to detect object positions across frames and compute the velocity vectors. Finally, the obtained coordinates underwent normalization through a perspective transformation, ensuring that the data remained unaffected by variations in camera configurations across different arenas and camera setups. We refer to this standardized coordinate space as the 'unified space'. The dataset holds significant potential for reuse in various domains related to sports analytics and machine learning research. It can serve as a valuable resource for researchers, coaches, and sports enthusiasts, contributing to improvements in player performance, game strategies, match retransmissions, and sports-related technologies.
本文展示了一个合成数据集,该数据集包含西班牙加利西亚地区举行的联邦手球和篮球比赛录像中带标签的比赛场景。该数据集由从真实视频帧生成的合成数据组成,包括从2105个手球和383个篮球5秒视频片段中提取的308,805个带标签的手球帧和56,578个带标签的篮球帧。专家根据各自的运动项目对手视频片段进行了手动标注,而单个帧则使用计算机视觉和机器学习技术进行自动标注。该数据集涵盖七类比赛场景:左路进攻、左路反击、左路罚球、右路进攻、右路反击、右路罚球和暂停。在篮球比赛中,罚球类别指球员被对方球员犯规后进行的罚球尝试。数据集中的每个帧根据比赛场景和特定上下文被分配到这些类别之一。重要的是,该数据集不包含实际的视频帧;相反,它以JSON格式提供每个帧的合成、标准化表示。此表格数据包括归一化场地中的球员、裁判和球的位置、球员和裁判的速度以及球场上的关键区域。通过目标检测器在每个帧中自动推断球员、裁判和球的位置,随后进行跟踪步骤以检测跨帧的目标位置并计算速度向量。最后,通过透视变换对获得的坐标进行归一化,确保数据不受不同场馆和摄像头设置的摄像头配置变化的影响。我们将这个标准化的坐标空间称为“统一空间”。该数据集在与体育分析和机器学习研究相关的各个领域具有巨大的重用潜力。它可以作为研究人员、教练和体育爱好者的宝贵资源,有助于提高球员表现、比赛策略、比赛转播以及与体育相关的技术。