基于Swin Transformer和CLIP的跨模态迁移学习智能机器人体育竞赛战术分析模型

Sports competition tactical analysis model of cross-modal transfer learning intelligent robot based on Swin Transformer and CLIP.

作者信息

Jiang Li, Lu Wang

机构信息

School of Physical Education of Yantai University, Yantai, China.

出版信息

Front Neurorobot. 2023 Oct 30;17:1275645. doi: 10.3389/fnbot.2023.1275645. eCollection 2023.

DOI:10.3389/fnbot.2023.1275645

PMID:37965071

原文链接:https://pmc.ncbi.nlm.nih.gov/articles/PMC10642548/

Abstract

INTRODUCTION

This paper presents an innovative Intelligent Robot Sports Competition Tactical Analysis Model that leverages multimodal perception to tackle the pressing challenge of analyzing opponent tactics in sports competitions. The current landscape of sports competition analysis necessitates a comprehensive understanding of opponent strategies. However, traditional methods are often constrained to a single data source or modality, limiting their ability to capture the intricate details of opponent tactics.

METHODS

Our system integrates the Swin Transformer and CLIP models, harnessing cross-modal transfer learning to enable a holistic observation and analysis of opponent tactics. The Swin Transformer is employed to acquire knowledge about opponent action postures and behavioral patterns in basketball or football games, while the CLIP model enhances the system's comprehension of opponent tactical information by establishing semantic associations between images and text. To address potential imbalances and biases between these models, we introduce a cross-modal transfer learning technique that mitigates modal bias issues, thereby enhancing the model's generalization performance on multimodal data.

RESULTS

Through cross-modal transfer learning, tactical information learned from images by the Swin Transformer is effectively transferred to the CLIP model, providing coaches and athletes with comprehensive tactical insights. Our method is rigorously tested and validated using Sport UV, Sports-1M, HMDB51, and NPU RGB+D datasets. Experimental results demonstrate the system's impressive performance in terms of prediction accuracy, stability, training time, inference time, number of parameters, and computational complexity. Notably, the system outperforms other models, with a remarkable 8.47% lower prediction error (MAE) on the Kinetics dataset, accompanied by a 72.86-second reduction in training time.

DISCUSSION

The presented system proves to be highly suitable for real-time sports competition assistance and analysis, offering a novel and effective approach for an Intelligent Robot Sports Competition Tactical Analysis Model that maximizes the potential of multimodal perception technology. By harnessing the synergies between the Swin Transformer and CLIP models, we address the limitations of traditional methods and significantly advance the field of sports competition analysis. This innovative model opens up new avenues for comprehensive tactical analysis in sports, benefiting coaches, athletes, and sports enthusiasts alike.

摘要

引言

本文提出了一种创新的智能机器人体育竞赛战术分析模型，该模型利用多模态感知来应对体育竞赛中分析对手战术这一紧迫挑战。当前体育竞赛分析的现状需要全面了解对手策略。然而，传统方法往往局限于单一数据源或模态，限制了它们捕捉对手战术复杂细节的能力。

方法

我们的系统集成了Swin Transformer和CLIP模型，利用跨模态迁移学习对对手战术进行整体观察和分析。Swin Transformer用于获取篮球或足球比赛中对手动作姿态和行为模式的知识，而CLIP模型通过在图像和文本之间建立语义关联来增强系统对对手战术信息的理解。为了解决这些模型之间潜在的不平衡和偏差问题，我们引入了一种跨模态迁移学习技术，减轻模态偏差问题，从而提高模型在多模态数据上的泛化性能。

结果

通过跨模态迁移学习，Swin Transformer从图像中学习到的战术信息被有效地转移到CLIP模型中，为教练和运动员提供了全面的战术见解。我们的方法使用Sport UV、Sports-1M、HMDB51和NPU RGB+D数据集进行了严格测试和验证。实验结果表明，该系统在预测准确性、稳定性、训练时间、推理时间、参数数量和计算复杂度方面表现出色。值得注意的是，该系统优于其他模型，在Kinetics数据集上预测误差（平均绝对误差）显著降低8.47%，同时训练时间减少了72.86秒。

讨论

所提出的系统被证明非常适合实时体育竞赛辅助和分析，为智能机器人体育竞赛战术分析模型提供了一种新颖有效的方法，最大限度地发挥了多模态感知技术的潜力。通过利用Swin Transformer和CLIP模型之间的协同作用，我们解决了传统方法的局限性，并显著推进了体育竞赛分析领域。这种创新模型为体育领域的全面战术分析开辟了新途径，使教练、运动员和体育爱好者都受益。

https://cdn.ncbi.nlm.nih.gov/pmc/blobs/1b18/10642548/f28700400264/fnbot-17-1275645-g0001.jpg

相似文献

Sports competition tactical analysis model of cross-modal transfer learning intelligent robot based on Swin Transformer and CLIP.基于Swin Transformer和CLIP的跨模态迁移学习智能机器人体育竞赛战术分析模型

Front Neurorobot. 2023 Oct 30;17:1275645. doi: 10.3389/fnbot.2023.1275645. eCollection 2023.

SwinCross: Cross-modal Swin transformer for head-and-neck tumor segmentation in PET/CT images.SwinCross：用于 PET/CT 图像中头颈部肿瘤分割的跨模态 Swin 变换器。

Med Phys. 2024 Mar;51(3):2096-2107. doi: 10.1002/mp.16703. Epub 2023 Sep 30.

Swimtrans Net: a multimodal robotic system for swimming action recognition driven via Swin-Transformer.Swimtrans网络：一种通过Swin Transformer驱动的用于游泳动作识别的多模态机器人系统。

Front Neurorobot. 2024 Sep 24;18:1452019. doi: 10.3389/fnbot.2024.1452019. eCollection 2024.

RL-CWtrans Net: multimodal swimming coaching driven via robot vision.RL-CWtrans网络：基于机器人视觉驱动的多模态游泳训练指导

Front Neurorobot. 2024 Aug 14;18:1439188. doi: 10.3389/fnbot.2024.1439188. eCollection 2024.

SG-Fusion: A swin-transformer and graph convolution-based multi-modal deep neural network for glioma prognosis.SG-Fusion：一种基于 Swin-Transformer 和图卷积的多模态深度神经网络，用于脑胶质瘤预后。

Artif Intell Med. 2024 Nov;157:102972. doi: 10.1016/j.artmed.2024.102972. Epub 2024 Aug 31.

Cross-modal self-attention mechanism for controlling robot volleyball motion.用于控制机器人排球运动的跨模态自注意力机制。

Front Neurorobot. 2023 Nov 10;17:1288463. doi: 10.3389/fnbot.2023.1288463. eCollection 2023.

Swin Transformer-Based Edge Guidance Network for RGB-D Salient Object Detection.基于Swin Transformer的RGB-D显著目标检测边缘引导网络

Sensors (Basel). 2023 Oct 29;23(21):8802. doi: 10.3390/s23218802.

Advancing brain tumor detection: harnessing the Swin Transformer's power for accurate classification and performance analysis.推进脑肿瘤检测：利用Swin Transformer的能力进行准确分类和性能分析。

PeerJ Comput Sci. 2024 Feb 29;10:e1867. doi: 10.7717/peerj-cs.1867. eCollection 2024.

Swin MAE: Masked autoencoders for small datasets.Swin MAE：适用于小数据集的掩码自编码器。

Comput Biol Med. 2023 Jul;161:107037. doi: 10.1016/j.compbiomed.2023.107037. Epub 2023 May 23.

Swin-HSTPS: Research on Target Detection Algorithms for Multi-Source High-Resolution Remote Sensing Images.Swin-HSTPS：多源高分遥感图像目标检测算法研究。

Sensors (Basel). 2021 Dec 4;21(23):8113. doi: 10.3390/s21238113.

引用本文的文献

Sports-ACtrans Net: research on multimodal robotic sports action recognition driven via ST-GCN.Sports-ACtrans网络：基于时空图卷积网络驱动的多模态机器人运动动作识别研究

Front Neurorobot. 2024 Oct 11;18:1443432. doi: 10.3389/fnbot.2024.1443432. eCollection 2024.

本文引用的文献

Knowledge Distillation and Student-Teacher Learning for Visual Intelligence: A Review and New Outlooks.用于视觉智能的知识蒸馏与师生学习：综述与新展望

IEEE Trans Pattern Anal Mach Intell. 2022 Jun;44(6):3048-3068. doi: 10.1109/TPAMI.2021.3055564. Epub 2022 May 5.

Deep Multimodal Transfer Learning for Cross-Modal Retrieval.深度多模态迁移学习在跨模态检索中的应用。

IEEE Trans Neural Netw Learn Syst. 2022 Feb;33(2):798-810. doi: 10.1109/TNNLS.2020.3029181. Epub 2022 Feb 3.

Privacy-Preserving Deep Action Recognition: An Adversarial Learning Framework and A New Dataset.隐私保护的深度动作识别：对抗学习框架与新数据集。

IEEE Trans Pattern Anal Mach Intell. 2022 Apr;44(4):2126-2139. doi: 10.1109/TPAMI.2020.3026709. Epub 2022 Mar 4.

文献检索

告别复杂PubMed语法，用中文像聊天一样搜索，搜遍4000万医学文献。AI智能推荐，让科研检索更轻松。

立即免费搜索

文件翻译

保留排版，准确专业，支持PDF/Word/PPT等文件格式，支持 12+语言互译。

免费翻译文档

深度研究

AI帮你快速写综述，25分钟生成高质量综述，智能提取关键信息，辅助科研写作。

立即免费体验

基于Swin Transformer和CLIP的跨模态迁移学习智能机器人体育竞赛战术分析模型

Sports competition tactical analysis model of cross-modal transfer learning intelligent robot based on Swin Transformer and CLIP.

作者信息

机构信息

出版信息

INTRODUCTION

METHODS

RESULTS

DISCUSSION

引言

方法

结果

讨论

相似文献

引用本文的文献

本文引用的文献

文献检索

文件翻译

深度研究

Suppr 超能文献

相似文献

引用本文的文献

本文引用的文献