Institute for Research in Operative Medicine, Faculty of Health - School of Medicine, Witten/Herdecke University, Ostmerheimer Str. 200, Building 38, 51109, Cologne, Germany.
LIFE Child, LIFE Leipzig Research Center for Civilization Diseases, Leipzig University, Ph.-Rosenthal-Str. 27, 04103, Leipzig, Germany.
BMC Med Res Methodol. 2021 Mar 11;21(1):51. doi: 10.1186/s12874-021-01231-y.
Systematic Reviews (SRs) can build the groundwork for evidence-based health care decision-making. A sound methodological quality of SRs is crucial. AMSTAR (A Measurement Tool to Assess Systematic Reviews) is a widely used tool developed to assess the methodological quality of SRs of randomized controlled trials (RCTs). Research shows that AMSTAR seems to be valid and reliable in terms of interrater reliability (IRR), but the test retest reliability (TRR) of AMSTAR has never been investigated. In our study we investigated the TRR of AMSTAR to evaluate the importance of its measurement and contribute to the discussion of the measurement properties of AMSTAR and other quality assessment tools.
Seven raters at three institutions independently assessed the methodological quality of SRs in the field of occupational health with AMSTAR. Between the first and second ratings was a timespan of approximately two years. Answers were dichotomized, and we calculated the TRR of all raters and AMSTAR items using Gwet's AC1 coefficient. To investigate the impact of variation in the ratings over time, we obtained summary scores for each review.
AMSTAR item 4 (Was the status of publication used as an inclusion criterion?) provided the lowest median TRR of 0.53 (moderate agreement). Perfect agreement of all reviewers was detected for AMSTAR-item 1 with a Gwet's AC1 of 1, which represented perfect agreement. The median TRR of the single raters varied between 0.69 (substantial agreement) and 0.89 (almost perfect agreement). Variation of two or more points in yes-scored AMSTAR items was observed in 65% (73/112) of all assessments.
The high variation between the first and second AMSTAR ratings suggests that consideration of the TRR is important when evaluating the psychometric properties of AMSTAR.. However, more evidence is needed to investigate this neglected issue of measurement properties. Our results may initiate discussion of the importance of considering the TRR of assessment tools. A further examination of the TRR of AMSTAR, as well as other recently established rating tools such as AMSTAR 2 and ROBIS (Risk Of Bias In Systematic reviews), would be useful.
系统评价(SRs)可以为循证医疗决策奠定基础。SRs 的方法学质量良好至关重要。AMSTAR(一种用于评估随机对照试验(RCTs)系统评价方法学质量的工具)是一种广泛使用的工具,用于评估 SRs 的方法学质量。研究表明,AMSTAR 在评分者间信度(IRR)方面似乎具有有效性和可靠性,但从未研究过 AMSTAR 的重测信度(TRR)。在我们的研究中,我们调查了 AMSTAR 的 TRR,以评估其测量的重要性,并为 AMSTAR 和其他质量评估工具的测量特性的讨论做出贡献。
在职业健康领域,7 名评审员在 3 个机构独立使用 AMSTAR 评估 SRs 的方法学质量。在第一次和第二次评分之间大约有两年的时间间隔。答案被二分类,我们使用 Gwet 的 AC1 系数计算了所有评审员和 AMSTAR 项目的 TRR。为了调查评分随时间变化的影响,我们为每个综述获得了总结评分。
AMSTAR 项目 4(是否将出版物的状态用作纳入标准?)的最低中位数 TRR 为 0.53(中度一致)。所有评审员在 AMSTAR-项目 1 中检测到完美的一致,Gwet 的 AC1 为 1,代表完美一致。单个评审员的中位数 TRR 介于 0.69(高度一致)和 0.89(几乎完美一致)之间。在所有评估中,有 65%(73/112)的 AMSTAR 评分项目中观察到两个或更多评分点的变化。
AMSTAR 第一次和第二次评分之间的高度变化表明,在评估 AMSTAR 的心理测量特性时,考虑 TRR 很重要。然而,需要更多的证据来研究测量特性这个被忽视的问题。我们的结果可能会引发对评估工具 TRR 重要性的讨论。进一步检查 AMSTAR 的 TRR,以及其他最近建立的评级工具,如 AMSTAR 2 和 ROBIS(系统评价中的偏倚风险),将是有用的。