用于系统评价中自动化筛选的统计停止标准。

Statistical stopping criteria for automated screening in systematic reviews.

机构信息

Mercator Research Institute on Global Commons and Climate Change, EUREF Campus 19, Torgauer Straße 12-15, Berlin, 10829, Germany.

Priestley International Centre for Climate, University of Leeds, Leeds, LS2 9JT, UK.

出版信息

Syst Rev. 2020 Nov 28;9(1):273. doi: 10.1186/s13643-020-01521-4.

Abstract

Active learning for systematic review screening promises to reduce the human effort required to identify relevant documents for a systematic review. Machines and humans work together, with humans providing training data, and the machine optimising the documents that the humans screen. This enables the identification of all relevant documents after viewing only a fraction of the total documents. However, current approaches lack robust stopping criteria, so that reviewers do not know when they have seen all or a certain proportion of relevant documents. This means that such systems are hard to implement in live reviews. This paper introduces a workflow with flexible statistical stopping criteria, which offer real work reductions on the basis of rejecting a hypothesis of having missed a given recall target with a given level of confidence. The stopping criteria are shown on test datasets to achieve a reliable level of recall, while still providing work reductions of on average 17%. Other methods proposed previously are shown to provide inconsistent recall and work reductions across datasets.

摘要

主动学习在系统评价筛选中具有广阔的应用前景,它有望减少系统评价中识别相关文献所需的人力。机器和人类协同工作,人类提供训练数据,机器优化人类筛选的文献。这使得在只查看总文献的一小部分后,就能够识别出所有相关的文献。然而,当前的方法缺乏稳健的停止标准,因此审查员不知道他们是否已经看到了所有或一定比例的相关文献。这意味着在实时审查中很难实施此类系统。本文介绍了一种具有灵活统计停止标准的工作流程,该流程在拒绝具有给定置信水平的给定召回目标的假设的基础上,提供了真正的工作减少。在测试数据集上展示了这些停止标准,以实现可靠的召回率,同时平均减少 17%的工作量。以前提出的其他方法显示在数据集之间提供不一致的召回率和工作减少。

https://cdn.ncbi.nlm.nih.gov/pmc/blobs/f681/7700715/463a77956110/13643_2020_1521_Fig1_HTML.jpg

文献AI研究员

20分钟写一篇综述,助力文献阅读效率提升50倍

立即体验

用中文搜PubMed

大模型驱动的PubMed中文搜索引擎

马上搜索