Department of Information and Computing Science, Faculty of Science, Utrecht University, Utrecht, The Netherlands.
Department of Methodology and Statistics, Faculty of Social and Behavioral Sciences, Utrecht University, Utrecht, The Netherlands.
Syst Rev. 2024 Jul 8;13(1):175. doi: 10.1186/s13643-024-02587-0.
Software that employs screening prioritization through active learning (AL) has accelerated the screening process significantly by ranking an unordered set of records by their predicted relevance. However, failing to find a relevant paper might alter the findings of a systematic review, highlighting the importance of identifying elusive papers. The time to discovery (TD) measures how many records are needed to be screened to find a relevant paper, making it a helpful tool for detecting such papers. The main aim of this project was to investigate how the choice of the model and prior knowledge influence the TD values of the hard-to-find relevant papers and their rank orders. A simulation study was conducted, mimicking the screening process on a dataset containing titles, abstracts, and labels used for an already published systematic review. The results demonstrated that AL model choice, and mostly the choice of the feature extractor but not the choice of prior knowledge, significantly influenced the TD values and the rank order of the elusive relevant papers. Future research should examine the characteristics of elusive relevant papers to discover why they might take a long time to be found.
通过主动学习(AL)进行筛选优先级排序的软件通过预测相关性对无序记录进行排序,从而大大加快了筛选过程。然而,未能找到相关文献可能会改变系统评价的结果,这凸显了识别难以找到的文献的重要性。发现时间(TD)衡量需要筛选多少记录才能找到相关文献,这使其成为检测此类文献的有用工具。该项目的主要目的是研究模型和先验知识的选择如何影响难以找到的相关文献的 TD 值及其排序。进行了一项模拟研究,在包含标题、摘要和用于已发表系统评价的标签的数据集上模拟筛选过程。结果表明,AL 模型的选择,主要是特征提取器的选择,而不是先验知识的选择,显著影响了难以捉摸的相关文献的 TD 值和排序。未来的研究应该研究难以捉摸的相关文献的特征,以发现为什么它们可能需要很长时间才能被找到。