Centre for Clinical Brain Sciences, University of Edinburgh, Edinburgh, Scotland.
Translational Neuropsychiatry Unit, Aarhus University, Aarhus, Denmark.
Syst Rev. 2019 Jan 15;8(1):23. doi: 10.1186/s13643-019-0942-7.
Here, we outline a method of applying existing machine learning (ML) approaches to aid citation screening in an on-going broad and shallow systematic review of preclinical animal studies. The aim is to achieve a high-performing algorithm comparable to human screening that can reduce human resources required for carrying out this step of a systematic review.
We applied ML approaches to a broad systematic review of animal models of depression at the citation screening stage. We tested two independently developed ML approaches which used different classification models and feature sets. We recorded the performance of the ML approaches on an unseen validation set of papers using sensitivity, specificity and accuracy. We aimed to achieve 95% sensitivity and to maximise specificity. The classification model providing the most accurate predictions was applied to the remaining unseen records in the dataset and will be used in the next stage of the preclinical biomedical sciences systematic review. We used a cross-validation technique to assign ML inclusion likelihood scores to the human screened records, to identify potential errors made during the human screening process (error analysis).
ML approaches reached 98.7% sensitivity based on learning from a training set of 5749 records, with an inclusion prevalence of 13.2%. The highest level of specificity reached was 86%. Performance was assessed on an independent validation dataset. Human errors in the training and validation sets were successfully identified using the assigned inclusion likelihood from the ML model to highlight discrepancies. Training the ML algorithm on the corrected dataset improved the specificity of the algorithm without compromising sensitivity. Error analysis correction leads to a 3% improvement in sensitivity and specificity, which increases precision and accuracy of the ML algorithm.
This work has confirmed the performance and application of ML algorithms for screening in systematic reviews of preclinical animal studies. It has highlighted the novel use of ML algorithms to identify human error. This needs to be confirmed in other reviews with different inclusion prevalence levels, but represents a promising approach to integrating human decisions and automation in systematic review methodology.
在这里,我们概述了一种应用现有机器学习(ML)方法的方法,以辅助正在进行的广泛而浅显的临床前动物研究系统评价中的引文筛选。目的是实现与人工筛选相媲美的高性能算法,从而减少系统评价这一步骤所需的人力资源。
我们在引文筛选阶段将 ML 方法应用于广泛的动物模型抑郁症系统评价。我们测试了两种独立开发的 ML 方法,它们使用了不同的分类模型和特征集。我们使用灵敏度、特异性和准确性记录了 ML 方法在未见文献集上的性能。我们的目标是达到 95%的灵敏度,并最大限度地提高特异性。提供最准确预测的分类模型将应用于数据集中其余未见过的记录,并将用于下一个临床前生物医学科学系统评价阶段。我们使用交叉验证技术为人工筛选的记录分配 ML 纳入可能性评分,以识别人工筛选过程中可能出现的错误(错误分析)。
基于对 5749 条记录的训练集的学习,ML 方法达到了 98.7%的灵敏度,纳入率为 13.2%。达到的最高特异性水平为 86%。在独立验证数据集上进行了性能评估。通过从 ML 模型分配的纳入可能性成功识别了训练集和验证集中的人工错误,以突出差异。在纠正后的数据集上训练 ML 算法可以提高算法的特异性而不影响灵敏度。错误分析校正可将灵敏度和特异性提高 3%,从而提高 ML 算法的精度和准确性。
这项工作证实了 ML 算法在临床前动物研究系统评价中的筛选性能和应用。它突出了 ML 算法在识别人工错误方面的新颖应用。这需要在具有不同纳入率水平的其他综述中进行确认,但代表了一种有前途的方法,可以将人工决策和自动化整合到系统评价方法中。