School of Population Health & Environmental Sciences, Faculty of Life Sciences and Medicine, King's College London, 3rd Floor, Addison House, Guy's Campus, London, SE1 1UL, UK.
Center for Evidence Synthesis in Health, Brown University, Providence, USA.
BMC Med Inform Decis Mak. 2019 May 8;19(1):96. doi: 10.1186/s12911-019-0814-z.
Assessing risks of bias in randomized controlled trials (RCTs) is an important but laborious task when conducting systematic reviews. RobotReviewer (RR), an open-source machine learning (ML) system, semi-automates bias assessments. We conducted a user study of RobotReviewer, evaluating time saved and usability of the tool.
Systematic reviewers applied the Cochrane Risk of Bias tool to four randomly selected RCT articles. Reviewers judged: whether an RCT was at low, or high/unclear risk of bias for each bias domain in the Cochrane tool (Version 1); and highlighted article text justifying their decision. For a random two of the four articles, the process was semi-automated: users were provided with ML-suggested bias judgments and text highlights. Participants could amend the suggestions if necessary. We measured time taken for the task, ML suggestions, usability via the System Usability Scale (SUS) and collected qualitative feedback.
For 41 volunteers, semi-automation was quicker than manual assessment (mean 755 vs. 824 s; relative time 0.75, 95% CI 0.62-0.92). Reviewers accepted 301/328 (91%) of the ML Risk of Bias (RoB) judgments, and 202/328 (62%) of text highlights without change. Overall, ML suggested text highlights had a recall of 0.90 (SD 0.14) and precision of 0.87 (SD 0.21) with respect to the users' final versions. Reviewers assigned the system a mean 77.7 SUS score, corresponding to a rating between "good" and "excellent".
Semi-automation (where humans validate machine learning suggestions) can improve the efficiency of evidence synthesis. Our system was rated highly usable, and expedited bias assessment of RCTs.
在进行系统评价时,评估随机对照试验(RCT)的偏倚风险是一项重要但繁琐的任务。RobotReviewer(RR)是一个开源的机器学习(ML)系统,可半自动地进行偏倚评估。我们对 RobotReviewer 进行了用户研究,评估了该工具节省的时间和可用性。
系统评价员应用 Cochrane 偏倚风险工具对随机选择的四篇 RCT 文章进行评估。评价员判断:每篇 Cochrane 工具(第 1 版)中各偏倚领域的 RCT 是否为低风险或高/不确定风险;并突出显示文章文本,以证明其决策的合理性。对于随机选择的四篇文章中的两篇,该过程是半自动的:用户获得了 ML 建议的偏倚判断和文本突出显示。如有必要,参与者可以修改建议。我们测量了完成任务的时间、ML 建议、通过系统可用性量表(SUS)进行的可用性以及收集定性反馈。
对于 41 名志愿者,半自动评估比手动评估更快(平均 755 秒与 824 秒;相对时间 0.75,95%CI 0.62-0.92)。审查员接受了 301/328(91%)的 ML 风险偏倚(RoB)判断,并且无需更改 202/328(62%)的文本突出显示。总体而言,ML 建议的文本突出显示在与用户最终版本的召回率为 0.90(SD 0.14)和精确度为 0.87(SD 0.21)。审查员为该系统分配了 77.7 SUS 平均得分,对应于“好”和“优秀”之间的评级。
半自动(其中人类验证机器学习建议)可以提高证据综合的效率。我们的系统被评为高度可用,并加快了 RCT 的偏倚评估。