Ottenbacher K J, Tomchek S D
School of Health Related Professions, State University of New York, Buffalo 14214.
Am J Occup Ther. 1993 Jan;47(1):10-6. doi: 10.5014/ajot.47.1.10.
Twenty studies examining the reliability of assessment devices and outcome measures in therapeutic research were reviewed and analyzed. The 20 investigations contained 215 quantitative reliability values published in either the American Journal of Occupational Therapy or Physical Therapy during the past 5 years. The reliability studies were classified as interrater, intrarater, test-retest, or internal consistency. Examination of interrater reliability accounted for 41% of all reported reliability values. Studies published in Physical Therapy were more likely to be concerned with test-retest reliability, whereas studies published in the American Journal of Occupational Therapy more often focused on interrater reliability. Examination of the data revealed that the intraclass correlation coefficient (ICC) was the most frequently reported estimate of reliability, accounting for 57% of all reported reliability coefficients. Further review of the results indicated that Pearson product-moment correlations and percentage of agreement indexes accounted for 22% of all reliability values reported in the studies examined. The Pearson product-moment correlation measures association or covariation among variables, but not agreement, and percentage agreement indexes do not correct for chance agreement. The argument is made that product-moment correlations and percentage agreement indexes are inadequate measures of interrater, intrarater or test-retest agreement. They should be used and interpreted with caution.
对20项研究进行了回顾和分析,这些研究探讨了治疗性研究中评估工具和结果测量的可靠性。这20项调查包含了过去5年发表在美国《职业治疗杂志》或《物理治疗》上的215个定量可靠性值。可靠性研究分为评分者间、评分者内、重测或内部一致性。评分者间可靠性的研究占所有报告可靠性值的41%。发表在《物理治疗》上的研究更关注重测可靠性,而发表在美国《职业治疗杂志》上的研究则更常关注评分者间可靠性。对数据的检查表明,组内相关系数(ICC)是最常报告的可靠性估计值,占所有报告可靠性系数的57%。对结果的进一步审查表明,皮尔逊积差相关系数和一致性百分比指数占所审查研究中报告的所有可靠性值的22%。皮尔逊积差相关系数衡量变量之间的关联或协变,但不衡量一致性,而一致性百分比指数未对机遇一致性进行校正。有人认为,积差相关系数和一致性百分比指数不足以衡量评分者间、评分者内或重测一致性。应谨慎使用和解释这些指标。