Hopkins W G
Department of Physiology, School of Medical Sciences and School of Physical Education, University of Otago, Dunedin, New Zealand.
Sports Med. 2000 Jul;30(1):1-15. doi: 10.2165/00007256-200030010-00001.
Reliability refers to the reproducibility of values of a test, assay or other measurement in repeated trials on the same individuals. Better reliability implies better precision of single measurements and better tracking of changes in measurements in research or practical settings. The main measures of reliability are within-subject random variation, systematic change in the mean, and retest correlation. A simple, adaptable form of within-subject variation is the typical (standard) error of measurement: the standard deviation of an individual's repeated measurements. For many measurements in sports medicine and science, the typical error is best expressed as a coefficient of variation (percentage of the mean). A biased, more limited form of within-subject variation is the limits of agreement: the 95% likely range of change of an individual's measurements between 2 trials. Systematic changes in the mean of a measure between consecutive trials represent such effects as learning, motivation or fatigue; these changes need to be eliminated from estimates of within-subject variation. Retest correlation is difficult to interpret, mainly because its value is sensitive to the heterogeneity of the sample of participants. Uses of reliability include decision-making when monitoring individuals, comparison of tests or equipment, estimation of sample size in experiments and estimation of the magnitude of individual differences in the response to a treatment. Reasonable precision for estimates of reliability requires approximately 50 study participants and at least 3 trials. Studies aimed at assessing variation in reliability between tests or equipment require complex designs and analyses that researchers seldom perform correctly. A wider understanding of reliability and adoption of the typical error as the standard measure of reliability would improve the assessment of tests and equipment in our disciplines.
可靠性是指在对同一受试者进行的重复试验中,测试、分析或其他测量值的可重复性。更高的可靠性意味着单次测量的精度更高,以及在研究或实际环境中对测量变化的跟踪更好。可靠性的主要测量指标包括受试者内部的随机变异、均值的系统变化以及重测相关性。受试者内部变异的一种简单且适用的形式是典型(标准)测量误差:个体重复测量的标准差。对于运动医学和科学中的许多测量,典型误差最好表示为变异系数(均值的百分比)。受试者内部变异的一种有偏差且更有限的形式是一致性界限:个体两次试验之间测量值变化的95%可能范围。连续试验之间测量均值的系统变化代表学习、动机或疲劳等效应;在估计受试者内部变异时需要消除这些变化。重测相关性难以解释,主要是因为其值对参与者样本的异质性敏感。可靠性的用途包括在监测个体时进行决策、比较测试或设备、估计实验中的样本量以及估计对治疗反应的个体差异大小。对可靠性估计的合理精度要求大约50名研究参与者和至少3次试验。旨在评估不同测试或设备之间可靠性差异的研究需要复杂的设计和分析,而研究人员很少能正确进行。对可靠性有更广泛的理解并采用典型误差作为可靠性的标准测量指标,将改善我们学科中对测试和设备的评估。