Center for Clinical Excellence and Guidelines, ECRI Institute, Evidence-based Practice Center, 5200, Plymouth Meeting, PA, 19462-1298, USA.
Syst Rev. 2020 Apr 2;9(1):73. doi: 10.1186/s13643-020-01324-7.
Improving the speed of systematic review (SR) development is key to supporting evidence-based medicine. Machine learning tools which semi-automate citation screening might improve efficiency. Few studies have assessed use of screening prioritization functionality or compared two tools head to head. In this project, we compared performance of two machine-learning tools for potential use in citation screening.
Using 9 evidence reports previously completed by the ECRI Institute Evidence-based Practice Center team, we compared performance of Abstrackr and EPPI-Reviewer, two off-the-shelf citations screening tools, for identifying relevant citations. Screening prioritization functionality was tested for 3 large reports and 6 small reports on a range of clinical topics. Large report topics were imaging for pancreatic cancer, indoor allergen reduction, and inguinal hernia repair. We trained Abstrackr and EPPI-Reviewer and screened all citations in 10% increments. In Task 1, we inputted whether an abstract was ordered for full-text screening; in Task 2, we inputted whether an abstract was included in the final report. For both tasks, screening continued until all studies ordered and included for the actual reports were identified. We assessed potential reductions in hypothetical screening burden (proportion of citations screened to identify all included studies) offered by each tool for all 9 reports.
For the 3 large reports, both EPPI-Reviewer and Abstrackr performed well with potential reductions in screening burden of 4 to 49% (Abstrackr) and 9 to 60% (EPPI-Reviewer). Both tools had markedly poorer performance for 1 large report (inguinal hernia), possibly due to its heterogeneous key questions. Based on McNemar's test for paired proportions in the 3 large reports, EPPI-Reviewer outperformed Abstrackr for identifying articles ordered for full-text review, but Abstrackr performed better in 2 of 3 reports for identifying articles included in the final report. For small reports, both tools provided benefits but EPPI-Reviewer generally outperformed Abstrackr in both tasks, although these results were often not statistically significant.
Abstrackr and EPPI-Reviewer performed well, but prioritization accuracy varied greatly across reports. Our work suggests screening prioritization functionality is a promising modality offering efficiency gains without giving up human involvement in the screening process.
提高系统评价 (SR) 开发速度是支持循证医学的关键。半自动筛选引文的机器学习工具可能会提高效率。很少有研究评估过筛选优先级功能或对两种工具进行直接比较。在这个项目中,我们比较了两种用于引文筛选的机器学习工具的性能。
使用 ECRI 研究所循证实践中心团队之前完成的 9 份证据报告,我们比较了 Abstr ackr 和 EPPI-Reviewer 这两种现成的引文筛选工具在识别相关引文方面的性能。对 3 个大型报告和 6 个小型报告进行了筛选优先级功能测试,涵盖了一系列临床主题。大型报告主题为胰腺癌的影像学、室内过敏原减少和腹股沟疝修补术。我们对 Abstr ackr 和 EPPI-Reviewer 进行了培训,并以 10%的增量筛选了所有引文。在任务 1 中,我们输入是否订购了摘要进行全文筛选;在任务 2 中,我们输入摘要是否包含在最终报告中。在这两个任务中,筛选会继续进行,直到确定了为实际报告订购和包含的所有研究。我们评估了每种工具对所有 9 份报告提供的潜在减少假设筛选负担(筛选的引文比例以确定所有包含的研究)。
对于 3 个大型报告,EPPI-Reviewer 和 Abstr ackr 的表现都很好,潜在的筛选负担减少了 4%至 49%(Abstr ackr)和 9%至 60%(EPPI-Reviewer)。对于 1 个大型报告(腹股沟疝),两种工具的性能都明显较差,这可能是由于其关键问题存在异质性。基于 3 个大型报告中配对比例的 McNemar 检验,EPPI-Reviewer 在识别订购全文审查的文章方面优于 Abstr ackr,但 Abstr ackr 在 3 个报告中的 2 个报告中更好地识别了包含在最终报告中的文章。对于小型报告,两种工具都提供了好处,但在两个任务中,EPPI-Reviewer 通常优于 Abstr ackr,尽管这些结果并不总是具有统计学意义。
Abstr ackr 和 EPPI-Reviewer 的表现良好,但优先级准确性在报告之间差异很大。我们的工作表明,筛选优先级功能是一种很有前途的模式,可以提高效率,而不会放弃人类在筛选过程中的参与。