Valentine Jeffrey C, Cooper Harris
Department of Educational and Counseling Psychology, College of Education and Human Development, University of Louisville, Louisville, KY 40292, USA.
Psychol Methods. 2008 Jun;13(2):130-49. doi: 10.1037/1082-989X.13.2.130.
Assessments of studies meant to evaluate the effectiveness of interventions, programs, and policies can serve an important role in the interpretation of research results. However, evidence suggests that available quality assessment tools have poor measurement characteristics and can lead to opposing conclusions when applied to the same body of studies. These tools tend to (a) be insufficiently operational, (b) rely on arbitrary post-hoc decision rules, and (c) result in a single number to represent a multidimensional construct. In response to these limitations, a multilevel and hierarchical instrument was developed in consultation with a wide range of methodological and statistical experts. The instrument focuses on the operational details of studies and results in a profile of scores instead of a single score to represent study quality. A pilot test suggested that satisfactory between-judge agreement can be obtained using well-trained raters working in naturalistic conditions. Limitations of the instrument are discussed, but these are inherent in making decisions about study quality given incomplete reporting and in the absence of strong, contextually based information about the effects of design flaws on study outcomes.
旨在评估干预措施、项目和政策有效性的研究评估,在研究结果的解释中可发挥重要作用。然而,有证据表明,现有的质量评估工具测量特性不佳,应用于同一组研究时可能得出相反的结论。这些工具往往(a)操作性不足,(b)依赖任意的事后决策规则,(c)用一个数字来代表多维结构。针对这些局限性,在与众多方法学和统计学专家协商后开发了一种多层次、分层的工具。该工具关注研究的操作细节,得出的是分数概况而非单一分数来代表研究质量。一项试点测试表明,在自然条件下工作的训练有素的评分者能够获得令人满意的评分者间一致性。文中讨论了该工具的局限性,但鉴于报告不完整以及缺乏关于设计缺陷对研究结果影响的强有力的、基于背景的信息,在对研究质量进行决策时这些局限性是不可避免的。