Au Flora, Prahardhi Shirlina, Shiell Alan
Centre for Health and Policy Studies, Department of Community Health Sciences, University of Calgary, Calgary, AB, Canada.
Value Health. 2008 May-Jun;11(3):435-9. doi: 10.1111/j.1524-4733.2007.00255.x.
To assess the reliability of two instruments designed for critical appraisal of economic evaluations: the Quality of Health Economic Studies (QHES) scale and the Pediatric Quality Appraisal Questionnaire (PQAQ).
Thirty published articles were chosen at random from a recent bibliography of economic evaluations in health promotion. The quality of each of these studies was assessed independently by two raters using each of the two instruments. Inter-rater reliability and the agreement between the instruments were measured using an intraclass correlation coefficient (ICC). Cronbach's generalizability theory was also used to assess the sources of variation in quality scores of the studies and to indicate where improvements in reliability could best be made.
Inter-rater reliability was excellent for both instruments (ICC = 0.81 for the QHES and 0.80 for the PQAQ). Agreement between the instruments varied (ICC = 0.77 for rater 1 and 0.56 for rater 2). The biggest source of variation in the scores assigned to the articles was the quality of the study (56% of total variance). Conventional measurement error explained 31% of the total variance. Variation due to rater (< 0.1%) and measurement instrument (1.8%) was very low.
The results suggest that the two instruments perform equally well. Choice of instrument can therefore be based on other criteria--simplicity and speed of application in the case of one, and detail in the information provided in the case of the other. There is little improvement in reliability to be gained from using more than one rater or more than one assessment of quality.
评估用于批判性评价经济评估的两种工具的可靠性:健康经济研究质量(QHES)量表和儿科质量评估问卷(PQAQ)。
从最近一份健康促进经济评估的参考文献中随机选取30篇已发表的文章。两名评分者分别使用这两种工具对每篇研究的质量进行独立评估。使用组内相关系数(ICC)来测量评分者间的可靠性以及两种工具之间的一致性。还使用克朗巴赫概化理论来评估研究质量得分的变异来源,并指出在何处最能提高可靠性。
两种工具的评分者间可靠性都非常好(QHES的ICC = 0.81,PQAQ的ICC = 0.80)。两种工具之间的一致性有所不同(评分者1的ICC = 0.77,评分者2的ICC = 0.56)。分配给文章的分数中最大的变异来源是研究质量(占总变异的56%)。传统测量误差解释了总变异的31%。评分者导致的变异(<0.1%)和测量工具导致的变异(1.8%)非常低。
结果表明这两种工具表现相当。因此,可以基于其他标准选择工具——一种工具的简单性和应用速度,以及另一种工具提供信息的详细程度。使用多名评分者或对质量进行多次评估在可靠性方面几乎没有提高。