Baethge Christopher, Goldbeck-Wood Sandra, Mertens Stephan
Deutsches Ärzteblatt and Deutsches Ärzteblatt International, Dieselstraße 2, D-50859 Cologne, Germany.
2Department of Psychiatry and Psychotherapy, University of Cologne Medical School, Cologne, Germany.
Res Integr Peer Rev. 2019 Mar 26;4:5. doi: 10.1186/s41073-019-0064-8. eCollection 2019.
Narrative reviews are the commonest type of articles in the medical literature. However, unlike systematic reviews and randomized controlled trials (RCT) articles, for which formal instruments exist to evaluate quality, there is currently no instrument available to assess the quality of narrative reviews. In response to this gap, we developed SANRA, the Scale for the Assessment of Narrative Review Articles.
A team of three experienced journal editors modified or deleted items in an earlier SANRA version based on face validity, item-total correlations, and reliability scores from previous tests. We deleted an item which addressed a manuscript's writing and accessibility due to poor inter-rater reliability. The six items which form the revised scale are rated from 0 (low standard) to 2 (high standard) and cover the following topics: explanation of (1) the importance and (2) the aims of the review, (3) literature search and (4) referencing and presentation of (5) evidence level and (6) relevant endpoint data. For all items, we developed anchor definitions and examples to guide users in filling out the form. The revised scale was tested by the same editors (blinded to each other's ratings) in a group of 30 consecutive non-systematic review manuscripts submitted to a general medical journal.
Raters confirmed that completing the scale is feasible in everyday editorial work. The mean sum score across all 30 manuscripts was 6.0 out of 12 possible points (SD 2.6, range 1-12). Corrected item-total correlations ranged from 0.33 (item 3) to 0.58 (item 6), and Cronbach's alpha was 0.68 (internal consistency). The intra-class correlation coefficient (average measure) was 0.77 [95% CI 0.57, 0.88] (inter-rater reliability). Raters often disagreed on items 1 and 4.
SANRA's feasibility, inter-rater reliability, homogeneity of items, and internal consistency are sufficient for a scale of six items. Further field testing, particularly of validity, is desirable. We recommend rater training based on the "explanations and instructions" document provided with SANRA. In editorial decision-making, SANRA may complement journal-specific evaluation of manuscripts-pertaining to, e.g., audience, originality or difficulty-and may contribute to improving the standard of non-systematic reviews.
叙述性综述是医学文献中最常见的文章类型。然而,与存在正式工具来评估质量的系统评价和随机对照试验(RCT)文章不同,目前尚无可用工具来评估叙述性综述的质量。针对这一空白,我们开发了SANRA,即叙述性综述文章评估量表。
一个由三位经验丰富的期刊编辑组成的团队,基于表面效度、项目与总分的相关性以及先前测试的信度得分,对早期版本的SANRA中的项目进行修改或删除。由于评分者间信度较差,我们删除了一个涉及稿件写作和易读性的项目。构成修订后量表的六个项目的评分范围为0(低标准)至2(高标准),涵盖以下主题:(1)综述的重要性、(2)目的、(3)文献检索、(4)参考文献以及(5)证据水平和(6)相关终点数据的呈现。对于所有项目,我们制定了锚定定义和示例,以指导用户填写表格。修订后的量表由相同的编辑(对彼此的评分不知情)在提交给一本普通医学期刊的连续30篇非系统评价稿件中进行测试。
评分者确认在日常编辑工作中完成该量表是可行的。所有30篇稿件的平均总分在12分的可能总分中为6.0分(标准差2.6,范围1 - 12)。校正后的项目与总分的相关性范围为0.33(项目3)至0.58(项目6),Cronbach's α系数为0.68(内部一致性)。组内相关系数(平均测量值)为0.77 [95%置信区间0.57, 0.88](评分者间信度)。评分者在项目1和项目4上经常存在分歧。
对于一个六项量表而言,SANRA的可行性、评分者间信度、项目同质性和内部一致性是足够的。需要进一步进行实地测试,尤其是效度测试。我们建议根据随SANRA提供的“解释与说明”文件对评分者进行培训。在编辑决策中,SANRA可以补充针对稿件的特定期刊评估,例如针对受众、原创性或难度等方面的评估,并可能有助于提高非系统评价的标准。