EPPI-Centre, UCL Social Research Institute, University College London, London, UK.
Res Synth Methods. 2022 Jan;13(1):121-133. doi: 10.1002/jrsm.1537. Epub 2021 Nov 25.
Manual screening of citation records could be reduced by using machine classifiers to remove records of very low relevance. This seems particularly feasible for update searches, where a machine classifier can be trained from past screening decisions. However, feasibility is unclear for broad topics. We evaluate the performance and implementation of machine classifiers for update searches of public health research using two case studies. The first study evaluates the impact of using different sets of training data on classifier performance, comparing recall and screening reduction with a manual screening 'gold standard'. The second study uses screening decisions from a review to train a classifier that is applied to rank the update search results. A stopping threshold was applied in the absence of a gold standard. Time spent screening titles and abstracts of different relevancy-ranked records was measured. Results: Study one: Classifier performance varies according to the training data used; all custom-built classifiers had a recall above 93% at the same threshold, achieving screening reductions between 41% and 74%. Study two: applying a classifier provided a solution for tackling a large volume of search results from the update search, and screening volume was reduced by 61%. A tentative estimate indicates over 25 h screening time was saved. In conclusion, custom-built machine classifiers are feasible for reducing screening workload from update searches across a range of public health interventions, with some limitation on recall. Key considerations include selecting a training dataset, agreeing stopping thresholds and processes to ensure smooth workflows.
通过使用机器分类器来去除低相关性的记录,可以减少手动筛选引用记录的工作量。对于更新搜索,这似乎特别可行,因为可以从过去的筛选决策中训练机器分类器。然而,对于广泛的主题,可行性尚不清楚。我们使用两个案例研究来评估机器分类器在公共卫生研究更新搜索中的性能和实施情况。第一项研究评估了使用不同训练数据集对分类器性能的影响,比较了使用手动筛选“黄金标准”的召回率和筛选减少率。第二项研究使用综述中的筛选决策来训练分类器,该分类器应用于对更新搜索结果进行排名。在没有黄金标准的情况下应用了一个停止阈值。测量了对不同相关性排序记录的标题和摘要进行筛选的时间。结果:研究一:分类器性能取决于使用的训练数据;所有定制的分类器在相同的阈值下都有 93%以上的召回率,实现了 41%至 74%的筛选减少率。研究二:应用分类器为处理更新搜索中大量的搜索结果提供了一种解决方案,筛选量减少了 61%。初步估计表明,节省了超过 25 小时的筛选时间。总之,定制的机器分类器可以有效地减少各种公共卫生干预措施的更新搜索中的筛选工作量,但召回率存在一些限制。关键考虑因素包括选择训练数据集、商定停止阈值和流程,以确保流程顺利进行。