Zhang Zhihong, Nezhad Mohamad Javad Momeni, Gupta Pallavi, Topaz Maxim, Zolnoori Maryam
Data Science Institute, Columbia University, New York, NY 10027, United States.
School of Nursing, Columbia University, New York, NY 10032.
Stud Health Technol Inform. 2025 Aug 7;329:1886-1887. doi: 10.3233/SHTI251264.
Systematic reviews involve time-intensive processes of screening titles, abstracts, and full texts to identify relevant studies. This study evaluates the potential of large language models (LLMs) to automate citation screening across three datasets with varying inclusion rates. Six LLMs were tested using zero- to five-shot in context-learning, with demonstration selection using PubMedBERT for semantic similarity. Majority voting and ensemble learning were applied to enhance performance. Results showed that no single LLM consistently excelled across the datasets, with sensitivity and specificity influenced by inclusion rates. Overall, ensemble learning and majority voting improved performance in citation screening.
系统评价涉及对标题、摘要和全文进行耗时的筛选过程,以识别相关研究。本研究评估了大语言模型(LLMs)在三个具有不同纳入率的数据集上自动进行文献筛选的潜力。使用零样本到五样本上下文学习对六个大语言模型进行了测试,并使用PubMedBERT进行语义相似性的示范选择。应用多数投票和集成学习来提高性能。结果表明,没有一个大语言模型在所有数据集上都始终表现出色,敏感性和特异性受纳入率影响。总体而言,集成学习和多数投票提高了文献筛选的性能。