Suppr超能文献

比较机器和人工评审员评估随机对照试验偏倚风险。

Comparing machine and human reviewers to evaluate the risk of bias in randomized controlled trials.

机构信息

Institute of Health Economics (IHE), Edmonton, Alberta, Canada.

Faculty of Rehabilitation Medicine, Department of Physical Therapy/Faculty of Medicine and Dentistry, University of Alberta, Edmonton, Alberta, Canada.

出版信息

Res Synth Methods. 2020 May;11(3):484-493. doi: 10.1002/jrsm.1398. Epub 2020 Mar 3.

Abstract

BACKGROUND

Evidence from new health technologies is growing, along with demands for evidence to inform policy decisions, creating challenges in completing health technology assessments (HTAs)/systematic reviews (SRs) in a timely manner. Software can decrease the time and burden by automating the process, but evidence validating such software is limited. We tested the accuracy of RobotReviewer, a semi-autonomous risk of bias (RoB) assessment tool, and its agreement with human reviewers.

METHODS

Two reviewers independently conducted RoB assessments on a sample of randomized controlled trials (RCTs), and their consensus ratings were compared with those generated by RobotReviewer. Agreement with the human reviewers was assessed using percent agreement and weighted kappa (κ). The accuracy of RobotReviewer was also assessed by calculating the sensitivity, specificity, and area under the curve in comparison to the consensus agreement of the human reviewers.

RESULTS

The study included 372 RCTs. Inter-rater reliability ranged from κ = -0.06 (no agreement) for blinding of participants and personnel to κ = 0.62 (good agreement) for random sequence generation (excluding overall RoB). RobotReviewer was found to use a high percentage of "irrelevant supporting quotations" to complement RoB assessments for blinding of participants and personnel (72.6%), blinding of outcome assessment (70.4%), and allocation concealment (54.3%).

CONCLUSION

RobotReviewer can help with risk of bias assessment of RCTs but cannot replace human evaluations. Thus, reviewers should check and validate RoB assessments from RobotReviewer by consulting the original article when not relevant supporting quotations are provided by RobotReviewer. This consultation is in line with the recommendation provided by the developers.

摘要

背景

新的医疗技术证据不断增加,同时也需要证据来为政策决策提供信息,这给及时完成医疗技术评估(HTA)/系统评价(SR)带来了挑战。软件可以通过自动化流程来减少时间和负担,但验证此类软件的证据有限。我们测试了 RobotReviewer 的准确性,这是一种半自动偏倚(RoB)评估工具,以及它与人评阅者的一致性。

方法

两位评阅者分别对一组随机对照试验(RCT)进行 RoB 评估,然后将他们的共识评分与 RobotReviewer 生成的评分进行比较。使用百分比一致性和加权 κ(κ)来评估与人类评阅者的一致性。通过计算与人类评阅者共识一致性的灵敏度、特异性和曲线下面积,评估 RobotReviewer 的准确性。

结果

该研究共纳入 372 项 RCT。评阅者间可靠性范围从参与者和人员盲法的 κ = -0.06(无一致性)到随机序列生成(不包括总体 RoB)的 κ = 0.62(良好一致性)。发现 RobotReviewer 大量使用“不相关的支持引语”来补充参与者和人员盲法(72.6%)、结局评估盲法(70.4%)和分配隐藏盲法(54.3%)的 RoB 评估。

结论

RobotReviewer 可以帮助评估 RCT 的偏倚风险,但不能替代人类评估。因此,当 RobotReviewer 未提供相关支持引语时,评阅者应查阅原文,检查并验证来自 RobotReviewer 的 RoB 评估。这种咨询符合开发人员的建议。

文献AI研究员

20分钟写一篇综述,助力文献阅读效率提升50倍。

立即体验

用中文搜PubMed

大模型驱动的PubMed中文搜索引擎

马上搜索

文档翻译

学术文献翻译模型,支持多种主流文档格式。

立即体验