Pedersen Arve Vorland, Lorås Håvard
Department of Neuromedicine and Movement Science, Faculty of Medicine and Health Sciences, Norwegian University of Science and TechnologyTrondheim, Norway.
Front Psychol. 2017 Apr 25;8:619. doi: 10.3389/fpsyg.2017.00619. eCollection 2017.
Tests or test batteries used for assessing motor skills, either in research studies or in clinical settings, apply a variety of procedures for scoring performances, including everything from one to ten attempts, of which the best is scored or an average is computed. The rationale behind scoring procedures is rarely stated, and it seems that the number of attempts allowed is decided without much qualification from research. It is uncertain whether procedures fairly capture an individual's skill level. Thus, the validity of the tests may be compromised. The present study tested 24 young female soccer players on the juggling of a soccer ball. They were given 10 attempts, and trials were scored according to nine different procedures including the 'best of' or 'mean of' either one, two, three, five, or ten attempts. Individual raw scores differed widely across trials, but no general effect of trials was found. The mean (SD) percentage difference between the lowest and highest scores was 27.7(9.9)%, with 17 players (71%) demonstrating a significant change from lowest to highest score. Correlations between raw scores were low across trials, while they were generally higher across scoring procedures. The first trial was significantly different from the remaining both as a raw score and as scoring procedure. The mean percentage difference between best-of-two and best-of-ten scores was 95%, with 50 % of the players demonstrating a significant difference between the two scoring procedures. No significant differences were found across mean-of-rule scorings. Best-of-rule and mean-of-rule scorings were significantly different except for the best-of-two vs. mean-of-two. The mean difference between highest and lowest rank across players was 6.7 (3.6), with individual rankings within the group varying 33% on average across procedures. One player moved from 3rd to 23rd place because of procedural differences. Therefore, it is concluded that scoring procedures affect results and may have an impact on test outcomes. This may present consequences for decision-making from test results, such as diagnosing and selection of intervention groups. We hope that our results would inspire further research into the scoring procedures of the vast amount of tests and tasks in common use.
无论是在研究中还是临床环境中,用于评估运动技能的测试或测试组合都采用了多种对表现进行评分的程序,包括从1次到10次尝试的所有情况,其中对最佳成绩进行评分或计算平均值。评分程序背后的基本原理很少被阐述,而且似乎允许的尝试次数在研究中没有太多限定就被确定了。不确定这些程序是否能公平地反映个人的技能水平。因此,测试的有效性可能会受到影响。本研究对24名年轻女性足球运动员进行了颠球测试。她们有10次尝试机会,测试根据9种不同的程序进行评分,包括取1次、2次、3次、5次或10次尝试中的“最佳成绩”或“平均值”。不同测试中个体的原始分数差异很大,但未发现测试的总体影响。最低分和最高分之间的平均(标准差)百分比差异为27.7(9.9)%,17名球员(71%)的最低分和最高分之间有显著变化。不同测试中原始分数之间的相关性较低,而在不同评分程序中相关性通常较高。第一次测试作为原始分数和评分程序与其余测试均有显著差异。两次尝试中的最佳成绩和十次尝试中的最佳成绩之间的平均百分比差异为95%,50%的球员在两种评分程序之间表现出显著差异。在规则平均值评分中未发现显著差异。除了两次尝试中的最佳成绩与两次尝试的平均值外,规则最佳成绩和规则平均值评分有显著差异。球员中最高排名和最低排名之间的平均差异为6.7(3.6),组内个体排名在不同程序之间平均变化33%。一名球员由于程序差异从第3名降至第23名。因此,得出结论,评分程序会影响结果,可能对测试结果产生影响。这可能会对根据测试结果进行的决策产生影响,例如诊断和干预组的选择。我们希望我们的结果能激发对大量常用测试和任务的评分程序进行进一步研究。