• 文献检索
  • 文档翻译
  • 深度研究
  • 学术资讯
  • Suppr Zotero 插件Zotero 插件
  • 邀请有礼
  • 套餐&价格
  • 历史记录
应用&插件
Suppr Zotero 插件Zotero 插件浏览器插件Mac 客户端Windows 客户端微信小程序
定价
高级版会员购买积分包购买API积分包
服务
文献检索文档翻译深度研究API 文档MCP 服务
关于我们
关于 Suppr公司介绍联系我们用户协议隐私条款
关注我们

Suppr 超能文献

核心技术专利:CN118964589B侵权必究
粤ICP备2023148730 号-1Suppr @ 2026

文献检索

告别复杂PubMed语法,用中文像聊天一样搜索,搜遍4000万医学文献。AI智能推荐,让科研检索更轻松。

立即免费搜索

文件翻译

保留排版,准确专业,支持PDF/Word/PPT等文件格式,支持 12+语言互译。

免费翻译文档

深度研究

AI帮你快速写综述,25分钟生成高质量综述,智能提取关键信息,辅助科研写作。

立即免费体验

大语言模型在文献筛选中的表现:不同纳入率数据集的比较

Performance of LLMs in Citation Screening: A Comparison Across Datasets with Varied Inclusion Rates.

作者信息

Zhang Zhihong, Nezhad Mohamad Javad Momeni, Gupta Pallavi, Topaz Maxim, Zolnoori Maryam

机构信息

Data Science Institute, Columbia University, New York, NY 10027, United States.

School of Nursing, Columbia University, New York, NY 10032.

出版信息

Stud Health Technol Inform. 2025 Aug 7;329:1886-1887. doi: 10.3233/SHTI251264.

DOI:10.3233/SHTI251264
PMID:40776281
Abstract

Systematic reviews involve time-intensive processes of screening titles, abstracts, and full texts to identify relevant studies. This study evaluates the potential of large language models (LLMs) to automate citation screening across three datasets with varying inclusion rates. Six LLMs were tested using zero- to five-shot in context-learning, with demonstration selection using PubMedBERT for semantic similarity. Majority voting and ensemble learning were applied to enhance performance. Results showed that no single LLM consistently excelled across the datasets, with sensitivity and specificity influenced by inclusion rates. Overall, ensemble learning and majority voting improved performance in citation screening.

摘要

系统评价涉及对标题、摘要和全文进行耗时的筛选过程,以识别相关研究。本研究评估了大语言模型(LLMs)在三个具有不同纳入率的数据集上自动进行文献筛选的潜力。使用零样本到五样本上下文学习对六个大语言模型进行了测试,并使用PubMedBERT进行语义相似性的示范选择。应用多数投票和集成学习来提高性能。结果表明,没有一个大语言模型在所有数据集上都始终表现出色,敏感性和特异性受纳入率影响。总体而言,集成学习和多数投票提高了文献筛选的性能。

相似文献

1
Performance of LLMs in Citation Screening: A Comparison Across Datasets with Varied Inclusion Rates.大语言模型在文献筛选中的表现:不同纳入率数据集的比较
Stud Health Technol Inform. 2025 Aug 7;329:1886-1887. doi: 10.3233/SHTI251264.
2
Enhancing AI for citation screening in literature reviews: Improving accuracy with ensemble models.在文献综述中增强人工智能用于文献筛选:使用集成模型提高准确性。
Int J Med Inform. 2025 Nov;203:106035. doi: 10.1016/j.ijmedinf.2025.106035. Epub 2025 Jul 1.
3
Large Language Model Synergy for Ensemble Learning in Medical Question Answering: Design and Evaluation Study.用于医学问答集成学习的大语言模型协同作用:设计与评估研究
J Med Internet Res. 2025 Jul 14;27:e70080. doi: 10.2196/70080.
4
A Weighted Voting Approach for Traditional Chinese Medicine Formula Classification Using Large Language Models: Algorithm Development and Validation Study.一种使用大语言模型的中医方剂分类加权投票方法:算法开发与验证研究
JMIR Med Inform. 2025 Jul 24;13:e69286. doi: 10.2196/69286.
5
A dataset and benchmark for hospital course summarization with adapted large language models.一个用于医院病程总结的数据集和基准测试,采用了适配的大语言模型。
J Am Med Inform Assoc. 2025 Mar 1;32(3):470-479. doi: 10.1093/jamia/ocae312.
6
Utilizing large language models for detecting hospital-acquired conditions: an empirical study on pulmonary embolism.利用大语言模型检测医院获得性疾病:关于肺栓塞的实证研究
J Am Med Inform Assoc. 2025 May 1;32(5):876-884. doi: 10.1093/jamia/ocaf048.
7
Learning to match patients to clinical trials using large language models.使用大型语言模型学习为患者匹配临床试验。
J Biomed Inform. 2024 Nov;159:104734. doi: 10.1016/j.jbi.2024.104734. Epub 2024 Oct 9.
8
RAMIE: retrieval-augmented multi-task information extraction with large language models on dietary supplements.RAMIE:基于大语言模型的膳食补充剂检索增强多任务信息提取
J Am Med Inform Assoc. 2025 Mar 1;32(3):545-554. doi: 10.1093/jamia/ocaf002.
9
Enhancing Pulmonary Disease Prediction Using Large Language Models With Feature Summarization and Hybrid Retrieval-Augmented Generation: Multicenter Methodological Study Based on Radiology Report.使用具有特征总结和混合检索增强生成功能的大语言模型增强肺部疾病预测:基于放射学报告的多中心方法学研究
J Med Internet Res. 2025 Jun 11;27:e72638. doi: 10.2196/72638.
10
Development and evaluation of prompts for a large language model to screen titles and abstracts in a living systematic review.用于在实时系统评价中筛选标题和摘要的大语言模型提示词的开发与评估
BMJ Ment Health. 2025 Jul 22;28(1):e301762. doi: 10.1136/bmjment-2025-301762.