Atkinson G, Nevill A M
Research Institute for Sport and Exercise Sciences, Liverpool John Moores University, England.
Sports Med. 1998 Oct;26(4):217-38. doi: 10.2165/00007256-199826040-00002.
Minimal measurement error (reliability) during the collection of interval- and ratio-type data is critically important to sports medicine research. The main components of measurement error are systematic bias (e.g. general learning or fatigue effects on the tests) and random error due to biological or mechanical variation. Both error components should be meaningfully quantified for the sports physician to relate the described error to judgements regarding 'analytical goals' (the requirements of the measurement tool for effective practical use) rather than the statistical significance of any reliability indicators. Methods based on correlation coefficients and regression provide an indication of 'relative reliability'. Since these methods are highly influenced by the range of measured values, researchers should be cautious in: (i) concluding acceptable relative reliability even if a correlation is above 0.9; (ii) extrapolating the results of a test-retest correlation to a new sample of individuals involved in an experiment; and (iii) comparing test-retest correlations between different reliability studies. Methods used to describe 'absolute reliability' include the standard error of measurements (SEM), coefficient of variation (CV) and limits of agreement (LOA). These statistics are more appropriate for comparing reliability between different measurement tools in different studies. They can be used in multiple retest studies from ANOVA procedures, help predict the magnitude of a 'real' change in individual athletes and be employed to estimate statistical power for a repeated-measures experiment. These methods vary considerably in the way they are calculated and their use also assumes the presence (CV) or absence (SEM) of heteroscedasticity. Most methods of calculating SEM and CV represent approximately 68% of the error that is actually present in the repeated measurements for the 'average' individual in the sample. LOA represent the test-retest differences for 95% of a population. The associated Bland-Altman plot shows the measurement error schematically and helps to identify the presence of heteroscedasticity. If there is evidence of heteroscedasticity or non-normality, one should logarithmically transform the data and quote the bias and random error as ratios. This allows simple comparisons of reliability across different measurement tools. It is recommended that sports clinicians and researchers should cite and interpret a number of statistical methods for assessing reliability. We encourage the inclusion of the LOA method, especially the exploration of heteroscedasticity that is inherent in this analysis. We also stress the importance of relating the results of any reliability statistic to 'analytical goals' in sports medicine.
在收集区间型和比率型数据时,最小测量误差(可靠性)对运动医学研究至关重要。测量误差的主要组成部分是系统偏差(例如测试中的一般学习或疲劳效应)以及由于生物或机械变异导致的随机误差。对于运动医学医生而言,这两种误差成分都应进行有意义的量化,以便将所描述的误差与关于“分析目标”(测量工具有效实际使用的要求)的判断相关联,而不是与任何可靠性指标的统计显著性相关联。基于相关系数和回归的方法提供了“相对可靠性”的指标。由于这些方法受测量值范围的影响很大,研究人员在以下方面应谨慎:(i)即使相关性高于0.9也得出可接受的相对可靠性结论;(ii)将重测相关性的结果外推到参与实验的新个体样本;(iii)比较不同可靠性研究之间的重测相关性。用于描述“绝对可靠性”的方法包括测量标准误差(SEM)、变异系数(CV)和一致性界限(LOA)。这些统计数据更适合比较不同研究中不同测量工具之间的可靠性。它们可用于方差分析程序的多次重测研究,有助于预测个体运动员“真实”变化的幅度,并用于估计重复测量实验的统计功效。这些方法在计算方式上有很大差异,并且它们的使用还假定存在(CV)或不存在(SEM)异方差性。计算SEM和CV的大多数方法表示样本中“平均”个体重复测量中实际存在的误差的约68%。LOA表示总体中95%的重测差异。相关的布兰德 - 奥特曼图示意性地显示了测量误差,并有助于识别异方差性的存在。如果有证据表明存在异方差性或非正态性,则应对数转换数据,并将偏差和随机误差作为比率引用。这允许对不同测量工具的可靠性进行简单比较。建议运动临床医生和研究人员引用并解释多种评估可靠性的统计方法。我们鼓励纳入LOA方法,特别是对该分析中固有的异方差性的探索。我们还强调将任何可靠性统计结果与运动医学中的“分析目标”相关联的重要性。