Chen Xiao, Liu Zhi
College of Physical Education and Recreation, Guangdong Ocean University, Zhanjiang, 524000, China.
Sports Department, University of Electronic Science and Technology of China, Chengdu, 611731, Sichuan, China.
Sci Rep. 2025 Aug 27;15(1):31571. doi: 10.1038/s41598-025-16752-5.
Group activity recognition in sports analysis is a critical challenge in computer vision, requiring robust modeling of complex player interactions and dynamic scenarios. Existing approaches predominantly rely on region-based features and two-stage pipelines involving individual localization and activity classification. These methods are inherently limited by their dependency on accurate bounding box detection and often struggle with feature entanglement, occlusions, and the integration of broader contextual information. To address these gaps, this study introduces the hierarchical query design and distributed attention framework within a transformer architecture, tailored specifically for player group activity recognition in sports. The proposed model, named the hierarchical attention query transformer (HAQT), leverages a novel dual-pathway architecture to decouple individual and group activity recognition. By employing hierarchical query design, the framework ensures efficient disentanglement of individual and group-level features. In contrast, a distributed attention mechanism facilitates refined communication within and across player groups. Additionally, the deformable transformer backbone dynamically aggregates multi-scale spatiotemporal features, enhancing the model's robustness to occlusions, variable player formations, and motion dynamics. The proposed set prediction paradigm eliminates reliance on bounding box accuracy, enabling precise player localization and activity classification. Comprehensive experiments on Volleyball and Basketball-51 datasets validate the effectiveness of the HAQT. On the Volleyball dataset, HAQT achieves a state-of-the-art mean Average Precision (mAP) of 92.8% for group activity recognition, significantly surpassing existing models. On the Basketball-51 dataset, it achieves an impressive accuracy of 92.76%, demonstrating its superior ability to model complex spatiotemporal dependencies.
体育分析中的群体活动识别是计算机视觉中的一项关键挑战,需要对复杂的运动员互动和动态场景进行稳健建模。现有方法主要依赖基于区域的特征和涉及个体定位与活动分类的两阶段流程。这些方法本质上受限于对精确边界框检测的依赖,并且常常在特征纠缠、遮挡以及更广泛上下文信息的整合方面存在困难。为了解决这些差距,本研究在Transformer架构中引入了分层查询设计和分布式注意力框架,专门针对体育中的运动员群体活动识别。所提出的模型名为分层注意力查询Transformer(HAQT),利用一种新颖的双路径架构来解耦个体和群体活动识别。通过采用分层查询设计,该框架确保了个体和群体级特征的有效解缠。相比之下,分布式注意力机制促进了运动员群体内部和之间的精细通信。此外,可变形Transformer主干动态聚合多尺度时空特征,增强了模型对遮挡、可变运动员阵型和运动动态的鲁棒性。所提出的集合预测范式消除了对边界框准确性的依赖,实现了精确的运动员定位和活动分类。在排球和篮球 - 51数据集上的综合实验验证了HAQT的有效性。在排球数据集上,HAQT在群体活动识别方面实现了92.8%的当前最优平均精度(mAP),显著超过现有模型。在篮球 - 51数据集上,它达到了令人印象深刻的92.76%的准确率,展示了其对复杂时空依赖进行建模的卓越能力。