Maher Christopher G, Sherrington Catherine, Herbert Robert D, Moseley Anne M, Elkins Mark
School of Physiotherapy, Faculty of Health Sciences, The University of Sydney, PO Box 170, Lidcombe, New South Wales 1825, Australia.
Phys Ther. 2003 Aug;83(8):713-21.
Assessment of the quality of randomized controlled trials (RCTs) is common practice in systematic reviews. However, the reliability of data obtained with most quality assessment scales has not been established. This report describes 2 studies designed to investigate the reliability of data obtained with the Physiotherapy Evidence Database (PEDro) scale developed to rate the quality of RCTs evaluating physical therapist interventions.
In the first study, 11 raters independently rated 25 RCTs randomly selected from the PEDro database. In the second study, 2 raters rated 120 RCTs randomly selected from the PEDro database, and disagreements were resolved by a third rater; this generated a set of individual rater and consensus ratings. The process was repeated by independent raters to create a second set of individual and consensus ratings. Reliability of ratings of PEDro scale items was calculated using multirater kappas, and reliability of the total (summed) score was calculated using intraclass correlation coefficients (ICC [1,1]).
The kappa value for each of the 11 items ranged from.36 to.80 for individual assessors and from.50 to.79 for consensus ratings generated by groups of 2 or 3 raters. The ICC for the total score was.56 (95% confidence interval=.47-.65) for ratings by individuals, and the ICC for consensus ratings was.68 (95% confidence interval=.57-.76).
The reliability of ratings of PEDro scale items varied from "fair" to "substantial," and the reliability of the total PEDro score was "fair" to "good."
在系统评价中,评估随机对照试验(RCT)的质量是常见做法。然而,大多数质量评估量表所获数据的可靠性尚未得到证实。本报告描述了两项旨在调查用物理治疗证据数据库(PEDro)量表所获数据可靠性的研究,该量表用于评估评价物理治疗师干预措施的RCT的质量。
在第一项研究中,11名评分者独立对从PEDro数据库中随机选取的25项RCT进行评分。在第二项研究中,2名评分者对从PEDro数据库中随机选取的120项RCT进行评分,分歧由第三名评分者解决;这产生了一组个体评分者评分和共识评分。独立评分者重复该过程以创建第二组个体评分和共识评分。使用多评分者卡帕值计算PEDro量表项目评分的可靠性,使用组内相关系数(ICC[1,1])计算总分(相加)的可靠性。
对于个体评估者而言,11个项目中每个项目的卡帕值范围为0.36至0.80,对于由2名或3名评分者组成的小组产生的共识评分,卡帕值范围为0.50至0.79。个体评分的总分ICC为0.56(95%置信区间=0.47 - 0.65),共识评分的ICC为0.68(95%置信区间=0.57 - 0.76)。
PEDro量表项目评分的可靠性从“中等”到“较高”不等,PEDro总分的可靠性为 “中等” 到 “良好”。