Wiwatthanasetthakarn Phongphat, Ponthongmak Wanchana, Looareesuwan Panu, Tansawet Amarit, Numthavaj Pawin, McKay Gareth J, Attia John, Thakkinstian Ammarin
Department of Clinical Epidemiology and Biostatistics, Faculty of Medicine Ramathibodi Hospital, Mahidol University, Bangkok, Thailand.
Department of Research and Medical Innovation, Faculty of Medicine Vajira Hospital, Navamindradhiraj University, Bangkok, Thailand.
J Med Internet Res. 2024 Dec 11;26:e56863. doi: 10.2196/56863.
Systematic reviews (SRs) are considered the highest level of evidence, but their rigorous literature screening process can be time-consuming and resource-intensive. This is particularly challenging given the rapid pace of medical advancements, which can quickly make SRs outdated. Few-shot learning (FSL), a machine learning approach that learns effectively from limited data, offers a potential solution to streamline this process. Sentence-bidirectional encoder representations from transformers (S-BERT) are particularly promising for identifying relevant studies with fewer examples.
This study aimed to develop a model framework using FSL to efficiently screen and select relevant studies for inclusion in SRs, aiming to reduce workload while maintaining high recall rates.
We developed and validated the FSL model framework using 9 previously published SR projects (2016-2018). The framework used S-BERT with titles and abstracts as input data. Key evaluation metrics, including workload reduction, cosine similarity score, and the number needed to screen at 100% recall, were estimated to determine the optimal number of eligible studies for model training. A prospective evaluation phase involving 4 ongoing SRs was then conducted. Study selection by FSL and a secondary reviewer were compared with the principal reviewer (considered the gold standard) to estimate the false negative rate.
Model development suggested an optimal range of 4-12 eligible studies for FSL training. Using 4-6 eligible studies during model development resulted in similarity thresholds for 100% recall, ranging from 0.432 to 0.636, corresponding to a workload reduction of 51.11% (95% CI 46.36-55.86) to 97.67% (95% CI 96.76-98.58). The prospective evaluation of 4 SRs aimed for a 50% workload reduction, yielding numbers needed to screen 497 to 1035 out of 995 to 2070 studies. The false negative rate ranged from 1.87% to 12.20% for the FSL model and from 5% to 56.48% for the second reviewer compared with the principal reviewer.
Our FSL framework demonstrates the potential for reducing workload in SR screening by over 50%. However, the model did not achieve 100% recall at this threshold, highlighting the potential for omitting eligible studies. Future work should focus on developing a web application to implement the FSL framework, making it accessible to researchers.
系统评价(SRs)被认为是最高级别的证据,但其严格的文献筛选过程可能既耗时又耗费资源。鉴于医学进步的快速步伐,这一挑战尤为突出,因为这可能会迅速使SRs过时。少样本学习(FSL)是一种能从有限数据中有效学习的机器学习方法,为简化这一过程提供了潜在的解决方案。来自变换器的句子双向编码器表示(S-BERT)在以较少示例识别相关研究方面特别有前景。
本研究旨在开发一个使用FSL的模型框架,以有效筛选和选择相关研究纳入SRs,旨在减少工作量同时保持高召回率。
我们使用9个先前发表的SR项目(2016 - 2018年)开发并验证了FSL模型框架。该框架使用S-BERT,将标题和摘要作为输入数据。估计关键评估指标,包括工作量减少、余弦相似度得分以及在100%召回率下需要筛选的数量,以确定模型训练的最佳合格研究数量。然后进行了一个涉及4个正在进行的SRs的前瞻性评估阶段。将FSL和一名二级评审员的研究选择与主要评审员(视为金标准)进行比较,以估计假阴性率。
模型开发表明FSL训练的最佳合格研究数量范围为4 - 12项。在模型开发过程中使用4 - 6项合格研究,在100%召回率下的相似度阈值范围为0.432至0.636,对应工作量减少51.11%(95%CI 46.36 - 55.86)至97.67%(95%CI 96.76 - 98.58)。对4项SRs的前瞻性评估旨在减少50%的工作量,在995至2070项研究中,需要筛选的数量为497至1035项。与主要评审员相比,FSL模型的假阴性率范围为1.87%至12.20%,二级评审员的假阴性率范围为5%至56.48%。
我们的FSL框架显示出在SR筛选中减少超过50%工作量的潜力。然而,在此阈值下该模型未达到100%召回率,这突出了遗漏合格研究的可能性。未来的工作应集中在开发一个网络应用程序来实施FSL框架,以便研究人员能够使用。