Chinese Evidence-based Medicine Center, Cochrane China Center and National Clinical Research Center for Geriatrics, West China Hospital, Sichuan University, Chengdu 610041, Sichuan, China.
Chinese Evidence-based Medicine Center, Cochrane China Center and National Clinical Research Center for Geriatrics, West China Hospital, Sichuan University, Chengdu 610041, Sichuan, China.
J Clin Epidemiol. 2021 May;133:121-129. doi: 10.1016/j.jclinepi.2021.01.010. Epub 2021 Jan 21.
To examine whether the use of natural language processing (NLP) technology is effective in assisting rapid title and abstract screening when updating a systematic review.
Using the searched literature from a published systematic review, we trained and tested an NLP model that enables rapid title and abstract screening when updating a systematic review. The model was a light gradient boosting machine (LightGBM), an ensemble learning classifier which integrates four pretrained Bidirectional Encoder Representations from Transformers (BERT) models. We divided the searched citations into two sets (ie, training and test sets). The model was trained using the training set and assessed for screening performance using the test set. The searched citations, whose eligibility was determined by two independent reviewers, were treated as the reference standard.
The test set included 947 citations; our model included 340 citations, excluded 607 citations, and achieved 96% sensitivity, and 78% specificity. If the classifier assessment in the case study was accepted, reviewers would lose 8 of 180 eligible citations (4%), none of which were ultimately included in the systematic review after full-text consideration, while decreasing the workload by 64.1%.
NLP technology using the ensemble learning method may effectively assist in rapid literature screening when updating systematic reviews.
探讨自然语言处理(NLP)技术在更新系统评价时,辅助快速进行标题和摘要筛选是否有效。
利用已发表系统评价中的检索文献,我们训练并测试了一种 NLP 模型,以便在更新系统评价时能够快速进行标题和摘要筛选。该模型是一种轻量级梯度提升机(LightGBM),是一种集成了四个预训练的来自 Transformer 的双向编码表示(BERT)模型的集成学习分类器。我们将检索到的文献分为两组(即训练集和测试集)。使用训练集对模型进行训练,并使用测试集评估其筛选性能。将通过两位独立评审员确定合格性的检索文献作为参考标准。
测试集包括 947 条引文;我们的模型包括 340 条引文,排除了 607 条引文,其敏感性为 96%,特异性为 78%。如果案例研究中的分类器评估被接受,评审员将错过 180 条合格文献中的 8 条(4%),这些文献在进行全文考虑后都没有被最终纳入系统评价,同时减少了 64.1%的工作量。
使用集成学习方法的 NLP 技术可能有助于在更新系统评价时快速进行文献筛选。