• 文献检索
  • 文档翻译
  • 深度研究
  • 学术资讯
  • Suppr Zotero 插件Zotero 插件
  • 邀请有礼
  • 套餐&价格
  • 历史记录
应用&插件
Suppr Zotero 插件Zotero 插件浏览器插件Mac 客户端Windows 客户端微信小程序
定价
高级版会员购买积分包购买API积分包
服务
文献检索文档翻译深度研究API 文档MCP 服务
关于我们
关于 Suppr公司介绍联系我们用户协议隐私条款
关注我们

Suppr 超能文献

核心技术专利:CN118964589B侵权必究
粤ICP备2023148730 号-1Suppr @ 2026

文献检索

告别复杂PubMed语法,用中文像聊天一样搜索,搜遍4000万医学文献。AI智能推荐,让科研检索更轻松。

立即免费搜索

文件翻译

保留排版,准确专业,支持PDF/Word/PPT等文件格式,支持 12+语言互译。

免费翻译文档

深度研究

AI帮你快速写综述,25分钟生成高质量综述,智能提取关键信息,辅助科研写作。

立即免费体验

变革文献筛选:大语言模型在系统评价中的新兴作用。

Transforming literature screening: The emerging role of large language models in systematic reviews.

作者信息

Delgado-Chaves Fernando M, Jennings Matthew J, Atalaia Antonio, Wolff Justus, Horvath Rita, Mamdouh Zeinab M, Baumbach Jan, Baumbach Linda

机构信息

Institute for Computational Systems Biology, Faculty of Mathematics, Informatics and Natural Sciences, University of Hamburg, Hamburg 22761, Germany.

Center for Motor Neuron Biology and Diseases, Department of Neurology Columbia University, New York, NY 10032.

出版信息

Proc Natl Acad Sci U S A. 2025 Jan 14;122(2):e2411962122. doi: 10.1073/pnas.2411962122. Epub 2025 Jan 6.

DOI:10.1073/pnas.2411962122
PMID:39761403
原文链接:https://pmc.ncbi.nlm.nih.gov/articles/PMC11745399/
Abstract

Systematic reviews (SR) synthesize evidence-based medical literature, but they involve labor-intensive manual article screening. Large language models (LLMs) can select relevant literature, but their quality and efficacy are still being determined compared to humans. We evaluated the overlap between title- and abstract-based selected articles of 18 different LLMs and human-selected articles for three SR. In the three SRs, 185/4,662, 122/1,741, and 45/66 articles have been selected and considered for full-text screening by two independent reviewers. Due to technical variations and the inability of the LLMs to classify all records, the LLM's considered sample sizes were smaller. However, on average, the 18 LLMs classified 4,294 (min 4,130; max 4,329), 1,539 (min 1,449; max 1,574), and 27 (min 22; max 37) of the titles and abstracts correctly as either included or excluded for the three SRs, respectively. Additional analysis revealed that the definitions of the inclusion criteria and conceptual designs significantly influenced the LLM performances. In conclusion, LLMs can reduce one reviewer´s workload between 33% and 93% during title and abstract screening. However, the exact formulation of the inclusion and exclusion criteria should be refined beforehand for ideal support of the LLMs.

摘要

系统评价(SR)综合基于证据的医学文献,但它们涉及劳动强度大的人工文章筛选。大语言模型(LLM)可以选择相关文献,但与人类相比,其质量和效果仍有待确定。我们评估了18种不同的大语言模型基于标题和摘要选择的文章与人类选择的文章在三项系统评价中的重叠情况。在这三项系统评价中,185/4662、122/1741和45/66篇文章已被两名独立评审员选出并考虑进行全文筛选。由于技术差异以及大语言模型无法对所有记录进行分类,大语言模型考虑的样本量较小。然而,平均而言,这18种大语言模型分别将三项系统评价中4294篇(最小值4130;最大值4329)、1539篇(最小值1449;最大值1574)和27篇(最小值22;最大值37)的标题和摘要正确分类为纳入或排除。进一步分析表明,纳入标准的定义和概念设计对大语言模型的性能有显著影响。总之,在标题和摘要筛选过程中,大语言模型可以将一名评审员的工作量减少33%至93%。然而,为了大语言模型提供理想的支持,应事先完善纳入和排除标准的确切表述。

https://cdn.ncbi.nlm.nih.gov/pmc/blobs/66af/11745399/9cf128ca26c8/pnas.2411962122fig02.jpg
https://cdn.ncbi.nlm.nih.gov/pmc/blobs/66af/11745399/93ff252dc095/pnas.2411962122fig01.jpg
https://cdn.ncbi.nlm.nih.gov/pmc/blobs/66af/11745399/9cf128ca26c8/pnas.2411962122fig02.jpg
https://cdn.ncbi.nlm.nih.gov/pmc/blobs/66af/11745399/93ff252dc095/pnas.2411962122fig01.jpg
https://cdn.ncbi.nlm.nih.gov/pmc/blobs/66af/11745399/9cf128ca26c8/pnas.2411962122fig02.jpg

相似文献

1
Transforming literature screening: The emerging role of large language models in systematic reviews.变革文献筛选:大语言模型在系统评价中的新兴作用。
Proc Natl Acad Sci U S A. 2025 Jan 14;122(2):e2411962122. doi: 10.1073/pnas.2411962122. Epub 2025 Jan 6.
2
Folic acid supplementation and malaria susceptibility and severity among people taking antifolate antimalarial drugs in endemic areas.在流行地区,服用抗叶酸抗疟药物的人群中,叶酸补充剂与疟疾易感性和严重程度的关系。
Cochrane Database Syst Rev. 2022 Feb 1;2(2022):CD014217. doi: 10.1002/14651858.CD014217.
3
Large language models for conducting systematic reviews: on the rise, but not yet ready for use-a scoping review.用于进行系统评价的大型语言模型:正在兴起,但尚未准备好投入使用——一项范围综述
J Clin Epidemiol. 2025 May;181:111746. doi: 10.1016/j.jclinepi.2025.111746. Epub 2025 Feb 26.
4
Development of Prompt Templates for Large Language Model-Driven Screening in Systematic Reviews.用于系统评价中大型语言模型驱动筛查的提示模板开发
Ann Intern Med. 2025 Mar;178(3):389-401. doi: 10.7326/ANNALS-24-02189. Epub 2025 Feb 25.
5
Title and abstract screening for literature reviews using large language models: an exploratory study in the biomedical domain.使用大型语言模型进行文献综述的标题和摘要筛选:生物医学领域的探索性研究。
Syst Rev. 2024 Jun 15;13(1):158. doi: 10.1186/s13643-024-02575-4.
6
Leveraging Large Language Models for Precision Monitoring of Chemotherapy-Induced Toxicities: A Pilot Study with Expert Comparisons and Future Directions.利用大语言模型进行化疗诱导毒性的精准监测:一项专家比较及未来方向的试点研究
Cancers (Basel). 2024 Aug 12;16(16):2830. doi: 10.3390/cancers16162830.
7
Accuracy of Large Language Models for Literature Screening in Thoracic Surgery: Diagnostic Study.大型语言模型在胸外科文献筛选中的准确性:诊断性研究
J Med Internet Res. 2025 Mar 11;27:e67488. doi: 10.2196/67488.
8
Improving Systematic Review Updates With Natural Language Processing Through Abstract Component Classification and Selection: Algorithm Development and Validation.通过摘要成分分类和选择利用自然语言处理改进系统评价更新:算法开发与验证
JMIR Med Inform. 2025 Mar 27;13:e65371. doi: 10.2196/65371.
9
High-performance automated abstract screening with large language model ensembles.使用大语言模型集成进行高性能自动摘要筛选。
J Am Med Inform Assoc. 2025 May 1;32(5):893-904. doi: 10.1093/jamia/ocaf050.
10
The Role of Large Language Models in Transforming Emergency Medicine: Scoping Review.大型语言模型在变革急诊医学中的作用:范围综述
JMIR Med Inform. 2024 May 10;12:e53787. doi: 10.2196/53787.

引用本文的文献

1
Artificial intelligence for the science of evidence synthesis: how good are AI-powered tools for automatic literature screening?用于证据综合科学的人工智能:人工智能驱动的自动文献筛选工具效果如何?
BMC Med Res Methodol. 2025 Aug 25;25(1):199. doi: 10.1186/s12874-025-02644-9.
2
Testing the utility of GPT for title and abstract screening in environmental systematic evidence synthesis.测试GPT在环境系统证据综合中用于标题和摘要筛选的效用。
Environ Evid. 2025 Apr 23;14(1):7. doi: 10.1186/s13750-025-00360-x.

本文引用的文献

1
Cost-Effectiveness of Treatments for Musculoskeletal Conditions Offered by Physiotherapists: A Systematic Review of Trial-Based Evaluations.物理治疗师提供的肌肉骨骼疾病治疗的成本效益:基于试验评估的系统评价
Sports Med Open. 2024 Apr 13;10(1):38. doi: 10.1186/s40798-024-00713-9.
2
Can large language models replace humans in systematic reviews? Evaluating GPT-4's efficacy in screening and extracting data from peer-reviewed and grey literature in multiple languages.大型语言模型能否在系统评价中取代人类?评估 GPT-4 从多种语言的同行评议文献和灰色文献中进行筛选和提取数据的效果。
Res Synth Methods. 2024 Jul;15(4):616-626. doi: 10.1002/jrsm.1715. Epub 2024 Mar 14.
3
Integrating large language models in systematic reviews: a framework and case study using ROBINS-I for risk of bias assessment.
将大型语言模型集成到系统评价中:使用 ROBINS-I 进行偏倚风险评估的框架和案例研究。
BMJ Evid Based Med. 2024 Nov 22;29(6):394-398. doi: 10.1136/bmjebm-2023-112597.
4
Transforming clinical trials: the emerging roles of large language models.变革临床试验:大语言模型的新兴作用
Transl Clin Pharmacol. 2023 Sep;31(3):131-138. doi: 10.12793/tcp.2023.31.e16. Epub 2023 Sep 19.
5
The global prevalence of overweight and obesity among nurses: A systematic review and meta-analyses.全球护士超重和肥胖的流行率:系统评价和荟萃分析。
J Clin Nurs. 2023 Dec;32(23-24):7934-7955. doi: 10.1111/jocn.16861. Epub 2023 Sep 29.
6
Streamlining Systematic Reviews: Harnessing Large Language Models for Quality Assessment and Risk-of-Bias Evaluation.简化系统评价:利用大语言模型进行质量评估和偏倚风险评估
Cureus. 2023 Aug 6;15(8):e43023. doi: 10.7759/cureus.43023. eCollection 2023 Aug.
7
Large language models in medicine.医学中的大型语言模型。
Nat Med. 2023 Aug;29(8):1930-1940. doi: 10.1038/s41591-023-02448-8. Epub 2023 Jul 17.
8
A semiparametric approach for meta-analysis of diagnostic accuracy studies with multiple cut-offs.一种用于分析具有多个截断值的诊断准确性研究的半参数方法。
Res Synth Methods. 2022 Sep;13(5):612-621. doi: 10.1002/jrsm.1579. Epub 2022 Jun 24.
9
Economic evaluations of musculoskeletal physiotherapy: protocol of a systematic review.肌肉骨骼物理治疗的经济评价:系统评价方案。
BMJ Open. 2022 Feb 15;12(2):e058143. doi: 10.1136/bmjopen-2021-058143.
10
Keeping Up With the Medical Literature: Why, How, and When?紧跟医学文献:为何、如何以及何时?
Stroke. 2021 Nov;52(11):e746-e748. doi: 10.1161/STROKEAHA.121.036141. Epub 2021 Oct 8.