Sheikh K
Arch Phys Med Rehabil. 1986 Apr;67(4):245-9.
Modified or newly developed disability scales have to be assessed for their validity in terms of an appropriate standard and for reproducibility--inter- and intraobserver variability and intrasubject variability. For ordinal scales with more than two points or categories, correlation or regression coefficients are appropriate estimates of validity. An assessment of the sensitivity and specificity of such a scale is not feasible. Indices of proportion agreement or correlation analysis are frequently used to assess the reproducibility of disability scales. These procedures do not, however, correct for chance-expected agreement between two or more sets of observations. In a study of a 31-point ADL (activities of daily living) index used to measure the level of disability in patients with chronic diseases, scores independently rated by two observers were strongly correlated (r = 0.962), yet the two sets of observations were significantly different. An estimate of kappa statistic, that corrects for chance agreement, showed that there was in fact a poor (36.3%) overall agreement between the observers. It is concluded that the correlation coefficient often overestimates the degree of true agreement, may conceal significant disagreements, and may give misleading information about reproducibility. The kappa statistic should always be used in the assessments of reproducibility of disability scales. Agreement between more than two sets of observations can also be assessed by estimating kappa.