测试偏倚风险工具显示，个体评审员之间以及评审员对之间的共识评估的可靠性较低。

Testing the risk of bias tool showed low reliability between individual reviewers and across consensus assessments of reviewer pairs.

机构信息

Department of Pediatrics, Alberta Research Centre for Health Evidence and the University of Alberta Evidence-based Practice Center, University of Alberta, 4-472 Edmonton Clinic Health Academy, 11405-87 Avenue, Edmonton, Alberta, Canada.

出版信息

J Clin Epidemiol. 2013 Sep;66(9):973-81. doi: 10.1016/j.jclinepi.2012.07.005. Epub 2012 Sep 13.

DOI:10.1016/j.jclinepi.2012.07.005

PMID:22981249

Abstract

OBJECTIVES

To assess the reliability of the Cochrane Risk of Bias (ROB) tool between individual raters and across consensus agreements of pairs of reviewers and examine the impact of study-level factors on reliability.

STUDY DESIGN AND SETTING

Two reviewers assessed risk of bias for 154 randomized controlled trials (RCTs). For 30 RCTs, two reviewers from each of four centers assessed risk of bias and reached consensus. We assessed interrater agreement using kappas and the impact of study-level factors through subgroup analyses.

RESULTS

Reliability between two reviewers was fair for most domains (κ=0.24-0.37), except sequence generation (κ=0.79, substantial). Reliability results across reviewer pairs: sequence generation, moderate (κ=0.60); allocation concealment and "other sources of bias," fair (κ=0.37-0.27); and other domains, slight (κ=0.05-0.09). Reliability was influenced by the nature of the outcome, nature of the intervention, study design, trial hypothesis, and funding source. Variability resulted from different interpretation of the tool rather than different information identified in the study reports.

CONCLUSION

Low agreement has implications for interpreting systematic reviews. These findings suggest the need for detailed guidance in assessing the risk of bias.

摘要

目的

评估个体评估者之间以及对审查员对之间共识协议的 Cochrane 偏倚风险 (ROB) 工具的可靠性，并研究研究水平因素对可靠性的影响。

研究设计和设置

两名审查员评估了 154 项随机对照试验 (RCT) 的偏倚风险。对于 30 项 RCT，来自四个中心的两名审查员评估了偏倚风险并达成共识。我们使用 Kappa 评估了评估者之间的一致性，并通过亚组分析评估了研究水平因素的影响。

结果

对于大多数领域（κ=0.24-0.37），两位审查员之间的可靠性为中等（κ=0.24-0.37），除了序列生成（κ=0.79，显著）。评估者对之间的可靠性结果：序列生成，中度（κ=0.60）；分配隐藏和“其他偏倚来源”，中等（κ=0.37-0.27）；其他领域，轻微（κ=0.05-0.09）。可靠性受到结局性质、干预性质、研究设计、试验假设和资金来源的影响。变异性源于工具的不同解释，而不是研究报告中确定的不同信息。