Suppr超能文献

克服运动表现数据中的多重共线性问题:偏最小二乘相关分析的新应用。

Overcoming the problem of multicollinearity in sports performance data: A novel application of partial least squares correlation analysis.

机构信息

Institute for Sport, Physical Activity and Leisure, Leeds Beckett University, Leeds, West Yorkshire, United Kingdom.

Leeds Rhinos Rugby League club, Leeds, United Kingdom.

出版信息

PLoS One. 2019 Feb 14;14(2):e0211776. doi: 10.1371/journal.pone.0211776. eCollection 2019.

Abstract

OBJECTIVES

Professional sporting organisations invest considerable resources collecting and analysing data in order to better understand the factors that influence performance. Recent advances in non-invasive technologies, such as global positioning systems (GPS), mean that large volumes of data are now readily available to coaches and sport scientists. However analysing such data can be challenging, particularly when sample sizes are small and data sets contain multiple highly correlated variables, as is often the case in a sporting context. Multicollinearity in particular, if not treated appropriately, can be problematic and might lead to erroneous conclusions. In this paper we present a novel 'leave one variable out' (LOVO) partial least squares correlation analysis (PLSCA) methodology, designed to overcome the problem of multicollinearity, and show how this can be used to identify the training load (TL) variables that influence most 'end fitness' in young rugby league players.

METHODS

The accumulated TL of sixteen male professional youth rugby league players (17.7 ± 0.9 years) was quantified via GPS, a micro-electrical-mechanical-system (MEMS), and players' session-rating-of-perceived-exertion (sRPE) over a 6-week pre-season training period. Immediately prior to and following this training period, participants undertook a 30-15 intermittent fitness test (30-15IFT), which was used to determine a players 'starting fitness' and 'end fitness'. In total twelve TL variables were collected, and these along with 'starting fitness' as a covariate were regressed against 'end fitness'. However, considerable multicollinearity in the data (VIF >1000 for nine variables) meant that the multiple linear regression (MLR) process was unstable and so we developed a novel LOVO PLSCA adaptation to quantify the relative importance of the predictor variables and thus minimise multicollinearity issues. As such, the LOVO PLSCA was used as a tool to inform and refine the MLR process.

RESULTS

The LOVO PLSCA identified the distance accumulated at very-high speed (>7 m·s-1) as being the most important TL variable to influence improvement in player fitness, with this variable causing the largest decrease in singular value inertia (5.93). When included in a refined linear regression model, this variable, along with 'starting fitness' as a covariate, explained 73% of the variance in v30-15IFT 'end fitness' (p<0.001) and eliminated completely any multicollinearity issues.

CONCLUSIONS

The LOVO PLSCA technique appears to be a useful tool for evaluating the relative importance of predictor variables in data sets that exhibit considerable multicollinearity. When used as a filtering tool, LOVO PLSCA produced a MLR model that demonstrated a significant relationship between 'end fitness' and the predictor variable 'accumulated distance at very-high speed' when 'starting fitness' was included as a covariate. As such, LOVO PLSCA may be a useful tool for sport scientists and coaches seeking to analyse data sets obtained using GPS and MEMS technologies.

摘要

目的

专业体育组织投入大量资源来收集和分析数据,以便更好地了解影响表现的因素。最近,非侵入性技术(如全球定位系统[GPS])的进步意味着教练和运动科学家现在可以轻松获得大量数据。然而,分析这些数据可能具有挑战性,特别是当样本量较小时,并且数据集包含多个高度相关的变量时,这在体育环境中经常发生。特别是,如果不适当处理多重共线性,则可能会出现问题,并可能导致错误的结论。在本文中,我们提出了一种新颖的“剔除一个变量”(LOVO)偏最小二乘相关分析(PLSCA)方法,旨在克服多重共线性问题,并展示如何使用该方法确定影响年轻英式橄榄球联盟球员“最终体能”的训练负荷(TL)变量。

方法

通过 GPS、微机电系统(MEMS)和球员的训练评估感知强度(sRPE),对 16 名男性职业青年英式橄榄球联盟球员(17.7±0.9 岁)的累积 TL 进行量化,在 6 周的季前训练期间。在这段训练期之前和之后,参与者进行了 30-15 次间歇体能测试(30-15IFT),该测试用于确定球员的“起始体能”和“最终体能”。总共收集了 12 个 TL 变量,这些变量与“起始体能”作为协变量一起回归到“最终体能”。然而,数据中的多重共线性很大(九个变量的方差膨胀因子>VIF>1000),使得多元线性回归(MLR)过程不稳定,因此我们开发了一种新颖的 LOVO PLSCA 适应性方法来量化预测变量的相对重要性,从而最小化多重共线性问题。因此,LOVO PLSCA 被用作一种工具来告知和完善 MLR 过程。

结果

LOVO PLSCA 确定速度非常高(>7 m·s-1)时所累积的距离是影响球员体能提高的最重要 TL 变量,该变量导致奇异值惯性降低最大(5.93)。当包含在一个改进的线性回归模型中时,这个变量,连同“起始体能”作为协变量,解释了 v30-15IFT“最终体能”方差的 73%(p<0.001),并完全消除了任何多重共线性问题。

结论

LOVO PLSCA 技术似乎是一种有用的工具,可用于评估在存在较大多重共线性的数据集内预测变量的相对重要性。当用作筛选工具时,LOVO PLSCA 产生的 MLR 模型在包含“起始体能”作为协变量时,显示出“最终体能”与“速度非常高时所累积的距离”之间存在显著关系。因此,LOVO PLSCA 可能是运动科学家和教练分析使用 GPS 和 MEMS 技术获得的数据的有用工具。

https://cdn.ncbi.nlm.nih.gov/pmc/blobs/6dae/6375576/c81804bf7b9b/pone.0211776.g001.jpg

文献检索

告别复杂PubMed语法,用中文像聊天一样搜索,搜遍4000万医学文献。AI智能推荐,让科研检索更轻松。

立即免费搜索

文件翻译

保留排版,准确专业,支持PDF/Word/PPT等文件格式,支持 12+语言互译。

免费翻译文档

深度研究

AI帮你快速写综述,25分钟生成高质量综述,智能提取关键信息,辅助科研写作。

立即免费体验