EPPI-Centre, UCL Social Research Institute, University College London, London, UK.
Cochrane Australia, School of Public Health and Preventive Medicine, Monash University, Melbourne, Australia.
J Clin Epidemiol. 2021 May;133:140-151. doi: 10.1016/j.jclinepi.2020.11.003. Epub 2020 Nov 7.
This study developed, calibrated, and evaluated a machine learning classifier designed to reduce study identification workload in Cochrane for producing systematic reviews.
A machine learning classifier for retrieving randomized controlled trials (RCTs) was developed (the "Cochrane RCT Classifier"), with the algorithm trained using a data set of title-abstract records from Embase, manually labeled by the Cochrane Crowd. The classifier was then calibrated using a further data set of similar records manually labeled by the Clinical Hedges team, aiming for 99% recall. Finally, the recall of the calibrated classifier was evaluated using records of RCTs included in Cochrane Reviews that had abstracts of sufficient length to allow machine classification.
The Cochrane RCT Classifier was trained using 280,620 records (20,454 of which reported RCTs). A classification threshold was set using 49,025 calibration records (1,587 of which reported RCTs), and our bootstrap validation found the classifier had recall of 0.99 (95% confidence interval 0.98-0.99) and precision of 0.08 (95% confidence interval 0.06-0.12) in this data set. The final, calibrated RCT classifier correctly retrieved 43,783 (99.5%) of 44,007 RCTs included in Cochrane Reviews but missed 224 (0.5%). Older records were more likely to be missed than those more recently published.
The Cochrane RCT Classifier can reduce manual study identification workload for Cochrane Reviews, with a very low and acceptable risk of missing eligible RCTs. This classifier now forms part of the Evidence Pipeline, an integrated workflow deployed within Cochrane to help improve the efficiency of the study identification processes that support systematic review production.
本研究开发、校准和评估了一种机器学习分类器,旨在减少 Cochrane 进行系统评价时的研究识别工作量。
开发了一种用于检索随机对照试验(RCT)的机器学习分类器(“Cochrane RCT 分类器”),该算法使用 Embase 的标题-摘要记录数据集进行训练,由 Cochrane 人群手动标记。然后,使用 Clinical Hedges 团队手动标记的类似记录数据集对分类器进行校准,目标是达到 99%的召回率。最后,使用包含足够长度的摘要允许机器分类的 Cochrane Reviews 中包含的 RCT 记录来评估校准分类器的召回率。
Cochrane RCT 分类器使用 280,620 条记录(其中 20,454 条报告了 RCT)进行训练。使用 49,025 条校准记录设置分类阈值(其中 1,587 条报告了 RCT),我们的 bootstrap 验证发现,该分类器在该数据集中的召回率为 0.99(95%置信区间 0.98-0.99),精度为 0.08(95%置信区间 0.06-0.12)。最终,校准的 RCT 分类器正确检索到 43,783(99.5%)条 Cochrane Reviews 中包含的 44,007 条 RCT,但遗漏了 224 条(0.5%)。与最近发表的 RCT 相比,较早的 RCT 更有可能被遗漏。
Cochrane RCT 分类器可以减少 Cochrane Reviews 的手动研究识别工作量,并且错过合格 RCT 的风险非常低且可以接受。该分类器现在是 Evidence Pipeline 的一部分,这是一种集成的工作流程,在 Cochrane 中部署以帮助提高支持系统评价制作的研究识别过程的效率。