Department of Methodology and Statistics, Faculty of Social and Behavioral Sciences, Utrecht University, Utrecht, The Netherlands.
Institute for Laboratory Animal Science, Hannover Medical School, Hannover, Germany.
Syst Rev. 2024 Feb 17;13(1):69. doi: 10.1186/s13643-024-02472-w.
Systematic reviews and meta-analyses typically require significant time and effort. Machine learning models have the potential to enhance screening efficiency in these processes. To effectively evaluate such models, fully labeled datasets-detailing all records screened by humans and their labeling decisions-are imperative. This paper presents the creation of a comprehensive dataset for a systematic review of treatments for Borderline Personality Disorder, as reported by Oud et al. (2018) for running a simulation study. The authors adhered to the PRISMA guidelines and published both the search query and the list of included records, but the complete dataset with all labels was not disclosed. We replicated their search and, facing the absence of initial screening data, introduced a Noisy Label Filter (NLF) procedure using active learning to validate noisy labels. Following the NLF application, no further relevant records were found. A simulation study employing the reconstructed dataset demonstrated that active learning could reduce screening time by 82.30% compared to random reading. The paper discusses potential causes for discrepancies, provides recommendations, and introduces a decision tree to assist in reconstructing datasets for the purpose of running simulation studies.
系统评价和荟萃分析通常需要大量的时间和精力。机器学习模型有可能提高这些过程中的筛选效率。为了有效地评估这些模型,需要使用完全标记的数据集——详细记录所有由人类筛选的记录及其标记决策。本文介绍了为 Oud 等人(2018 年)进行的一项关于边缘型人格障碍治疗的系统评价创建一个综合数据集。作者遵循 PRISMA 指南,并公布了搜索查询和纳入记录的列表,但未公开带有所有标签的完整数据集。我们复制了他们的搜索,并且由于缺乏初始筛选数据,引入了使用主动学习来验证噪声标签的噪声标签过滤(NLF)程序。在应用 NLF 后,没有发现其他相关记录。使用重建数据集进行的模拟研究表明,与随机阅读相比,主动学习可以将筛选时间减少 82.30%。本文讨论了差异的潜在原因,提供了建议,并引入了决策树来协助为运行模拟研究重建数据集。