• 文献检索
  • 文档翻译
  • 深度研究
  • 学术资讯
  • Suppr Zotero 插件Zotero 插件
  • 邀请有礼
  • 套餐&价格
  • 历史记录
应用&插件
Suppr Zotero 插件Zotero 插件浏览器插件Mac 客户端Windows 客户端微信小程序
定价
高级版会员购买积分包购买API积分包
服务
文献检索文档翻译深度研究API 文档MCP 服务
关于我们
关于 Suppr公司介绍联系我们用户协议隐私条款
关注我们

Suppr 超能文献

核心技术专利:CN118964589B侵权必究
粤ICP备2023148730 号-1Suppr @ 2026

文献检索

告别复杂PubMed语法,用中文像聊天一样搜索,搜遍4000万医学文献。AI智能推荐,让科研检索更轻松。

立即免费搜索

文件翻译

保留排版,准确专业,支持PDF/Word/PPT等文件格式,支持 12+语言互译。

免费翻译文档

深度研究

AI帮你快速写综述,25分钟生成高质量综述,智能提取关键信息,辅助科研写作。

立即免费体验

大语言模型作为文献综述工具的出现:一项大语言模型辅助的系统综述

The emergence of large language models as tools in literature reviews: a large language model-assisted systematic review.

作者信息

Scherbakov Dmitry, Hubig Nina, Jansari Vinita, Bakumenko Alexander, Lenert Leslie A

机构信息

Biomedical Informatics Center, Department of Public Health Sciences, Medical University of South Carolina (MUSC), Charleston, SC 29403, United States.

Interdisciplinary Transformation University, OG 2 A-4040 Linz, Austria.

出版信息

J Am Med Inform Assoc. 2025 Jun 1;32(6):1071-1086. doi: 10.1093/jamia/ocaf063.

DOI:10.1093/jamia/ocaf063
PMID:40332983
原文链接:https://pmc.ncbi.nlm.nih.gov/articles/PMC12089777/
Abstract

OBJECTIVES

This study aims to summarize the usage of large language models (LLMs) in the process of creating a scientific review by looking at the methodological papers that describe the use of LLMs in review automation and the review papers that mention they were made with the support of LLMs.

MATERIALS AND METHODS

The search was conducted in June 2024 in PubMed, Scopus, Dimensions, and Google Scholar by human reviewers. Screening and extraction process took place in Covidence with the help of LLM add-on based on the OpenAI GPT-4o model. ChatGPT and Scite.ai were used in cleaning the data, generating the code for figures, and drafting the manuscript.

RESULTS

Of the 3788 articles retrieved, 172 studies were deemed eligible for the final review. ChatGPT and GPT-based LLM emerged as the most dominant architecture for review automation (n = 126, 73.2%). A significant number of review automation projects were found, but only a limited number of papers (n = 26, 15.1%) were actual reviews that acknowledged LLM usage. Most citations focused on the automation of a particular stage of review, such as Searching for publications (n = 60, 34.9%) and Data extraction (n = 54, 31.4%). When comparing the pooled performance of GPT-based and BERT-based models, the former was better in data extraction with a mean precision of 83.0% (SD = 10.4) and a recall of 86.0% (SD = 9.8).

DISCUSSION AND CONCLUSION

Our LLM-assisted systematic review revealed a significant number of research projects related to review automation using LLMs. Despite limitations, such as lower accuracy of extraction for numeric data, we anticipate that LLMs will soon change the way scientific reviews are conducted.

摘要

目的

本研究旨在通过查看描述大语言模型(LLMs)在综述自动化中使用情况的方法学论文以及提及在LLMs支持下完成的综述论文,总结大语言模型在创建科学综述过程中的使用情况。

材料与方法

2024年6月,由人工评审员在PubMed、Scopus、Dimensions和谷歌学术中进行检索。筛选和提取过程借助基于OpenAI GPT - 4o模型的LLM插件在Covidence中进行。使用ChatGPT和Scite.ai进行数据清理、生成图表代码以及起草稿件。

结果

在检索到的3788篇文章中,172项研究被认为符合最终综述的条件。ChatGPT和基于GPT的大语言模型成为综述自动化中最主要的架构(n = 126,73.2%)。发现了大量的综述自动化项目,但只有有限数量的论文(n = 26,15.1%)是实际承认使用了大语言模型的综述。大多数引用集中在综述特定阶段的自动化,如搜索出版物(n = 60,34.9%)和数据提取(n = 54,31.4%)。在比较基于GPT和基于BERT的模型的综合性能时,前者在数据提取方面表现更好,平均精度为83.0%(标准差 = 10.4),召回率为86.0%(标准差 = 9.8)。

讨论与结论

我们的大语言模型辅助系统综述揭示了大量与使用大语言模型进行综述自动化相关的研究项目。尽管存在局限性,如数值数据提取的准确性较低,但我们预计大语言模型将很快改变科学综述的进行方式。

https://cdn.ncbi.nlm.nih.gov/pmc/blobs/e5a8/12089777/bb8c8c27b786/ocaf063f6.jpg
https://cdn.ncbi.nlm.nih.gov/pmc/blobs/e5a8/12089777/8a4e3acddc59/ocaf063f1.jpg
https://cdn.ncbi.nlm.nih.gov/pmc/blobs/e5a8/12089777/94a08152fc67/ocaf063f2.jpg
https://cdn.ncbi.nlm.nih.gov/pmc/blobs/e5a8/12089777/8ca1bb354212/ocaf063f3.jpg
https://cdn.ncbi.nlm.nih.gov/pmc/blobs/e5a8/12089777/42966bdf142b/ocaf063f4.jpg
https://cdn.ncbi.nlm.nih.gov/pmc/blobs/e5a8/12089777/be6017b08042/ocaf063f5.jpg
https://cdn.ncbi.nlm.nih.gov/pmc/blobs/e5a8/12089777/bb8c8c27b786/ocaf063f6.jpg
https://cdn.ncbi.nlm.nih.gov/pmc/blobs/e5a8/12089777/8a4e3acddc59/ocaf063f1.jpg
https://cdn.ncbi.nlm.nih.gov/pmc/blobs/e5a8/12089777/94a08152fc67/ocaf063f2.jpg
https://cdn.ncbi.nlm.nih.gov/pmc/blobs/e5a8/12089777/8ca1bb354212/ocaf063f3.jpg
https://cdn.ncbi.nlm.nih.gov/pmc/blobs/e5a8/12089777/42966bdf142b/ocaf063f4.jpg
https://cdn.ncbi.nlm.nih.gov/pmc/blobs/e5a8/12089777/be6017b08042/ocaf063f5.jpg
https://cdn.ncbi.nlm.nih.gov/pmc/blobs/e5a8/12089777/bb8c8c27b786/ocaf063f6.jpg

相似文献

1
The emergence of large language models as tools in literature reviews: a large language model-assisted systematic review.大语言模型作为文献综述工具的出现:一项大语言模型辅助的系统综述
J Am Med Inform Assoc. 2025 Jun 1;32(6):1071-1086. doi: 10.1093/jamia/ocaf063.
2
Large language models for conducting systematic reviews: on the rise, but not yet ready for use-a scoping review.用于进行系统评价的大型语言模型:正在兴起,但尚未准备好投入使用——一项范围综述
J Clin Epidemiol. 2025 May;181:111746. doi: 10.1016/j.jclinepi.2025.111746. Epub 2025 Feb 26.
3
Large Language Model-Assisted Systematic Review: Validation Based on Cochrane Review Data.大语言模型辅助的系统评价:基于Cochrane评价数据的验证
Stud Health Technol Inform. 2025 May 15;327:904-905. doi: 10.3233/SHTI250501.
4
Large Language Model Applications for Health Information Extraction in Oncology: Scoping Review.用于肿瘤学健康信息提取的大语言模型应用:范围综述
JMIR Cancer. 2025 Mar 28;11:e65984. doi: 10.2196/65984.
5
Large Language Models and Empathy: Systematic Review.大语言模型与同理心:系统综述
J Med Internet Res. 2024 Dec 11;26:e52597. doi: 10.2196/52597.
6
Privacy-ensuring Open-weights Large Language Models Are Competitive with Closed-weights GPT-4o in Extracting Chest Radiography Findings from Free-Text Reports.在从自由文本报告中提取胸部X光检查结果方面,确保隐私的开放权重大型语言模型与封闭权重的GPT-4o具有竞争力。
Radiology. 2025 Jan;314(1):e240895. doi: 10.1148/radiol.240895.
7
Large Language Models in Worldwide Medical Exams: Platform Development and Comprehensive Analysis.全球医学考试中的大语言模型:平台开发与综合分析
J Med Internet Res. 2024 Dec 27;26:e66114. doi: 10.2196/66114.
8
Accuracy of Large Language Models for Literature Screening in Thoracic Surgery: Diagnostic Study.大型语言模型在胸外科文献筛选中的准确性:诊断性研究
J Med Internet Res. 2025 Mar 11;27:e67488. doi: 10.2196/67488.
9
Exploring the Credibility of Large Language Models for Mental Health Support: Protocol for a Scoping Review.探索用于心理健康支持的大语言模型的可信度:一项范围综述方案
JMIR Res Protoc. 2025 Jan 29;14:e62865. doi: 10.2196/62865.
10
The Role of Large Language Models in Transforming Emergency Medicine: Scoping Review.大型语言模型在变革急诊医学中的作用:范围综述
JMIR Med Inform. 2024 May 10;12:e53787. doi: 10.2196/53787.

引用本文的文献

1
Risk of Bias Assessment of Diagnostic Accuracy Studies Using QUADAS 2 by Large Language Models.使用QUADAS-2对大型语言模型进行诊断准确性研究的偏倚风险评估
Diagnostics (Basel). 2025 Jun 6;15(12):1451. doi: 10.3390/diagnostics15121451.
2
Harnessing the power of large language models for clinical tasks and synthesis of scientific literature.利用大语言模型的能力来完成临床任务和综合科学文献。
J Am Med Inform Assoc. 2025 Jun 1;32(6):983-984. doi: 10.1093/jamia/ocaf071.
3
Natural Language Processing and Social Determinants of Health in Mental Health Research: AI-Assisted Scoping Review.

本文引用的文献

1
Utilizing Large language models to select literature for meta-analysis shows workload reduction while maintaining a similar recall level as manual curation.利用大语言模型为荟萃分析选择文献显示,在保持与人工筛选相似召回率的同时,工作量有所减少。
BMC Med Res Methodol. 2025 Apr 28;25(1):116. doi: 10.1186/s12874-025-02569-3.
2
Development of Prompt Templates for Large Language Model-Driven Screening in Systematic Reviews.用于系统评价中大型语言模型驱动筛查的提示模板开发
Ann Intern Med. 2025 Mar;178(3):389-401. doi: 10.7326/ANNALS-24-02189. Epub 2025 Feb 25.
3
Evaluating a large language model's ability to answer clinicians' requests for evidence summaries.
心理健康研究中的自然语言处理与健康的社会决定因素:人工智能辅助的范围综述
JMIR Ment Health. 2025 Jan 16;12:e67192. doi: 10.2196/67192.
评估大型语言模型回答临床医生对证据总结请求的能力。
J Med Libr Assoc. 2025 Jan 14;113(1):65-77. doi: 10.5195/jmla.2025.1985.
4
The ethics of disclosing the use of artificial intelligence tools in writing scholarly manuscripts.在撰写学术手稿时披露使用人工智能工具的伦理问题。
Res Ethics. 2023 Oct;19(4):449-465. doi: 10.1177/17470161231180449. Epub 2023 Jun 15.
5
Benchmarking Human-AI collaboration for common evidence appraisal tools.针对常见证据评估工具的人机协作基准测试。
J Clin Epidemiol. 2024 Nov;175:111533. doi: 10.1016/j.jclinepi.2024.111533. Epub 2024 Sep 12.
6
Evaluating the effectiveness of large language models in abstract screening: a comparative analysis.评估大型语言模型在摘要筛选中的有效性:一项对比分析。
Syst Rev. 2024 Aug 21;13(1):219. doi: 10.1186/s13643-024-02609-x.
7
A question-answering framework for automated abstract screening using large language models.基于大语言模型的自动文摘筛选的问答框架。
J Am Med Inform Assoc. 2024 Sep 1;31(9):1939-1952. doi: 10.1093/jamia/ocae166.
8
The ethics of ChatGPT in medicine and healthcare: a systematic review on Large Language Models (LLMs).ChatGPT在医学与医疗保健领域的伦理问题:关于大语言模型(LLMs)的系统综述
NPJ Digit Med. 2024 Jul 8;7(1):183. doi: 10.1038/s41746-024-01157-x.
9
GPT-4 performance on querying scientific publications: reproducibility, accuracy, and impact of an instruction sheet.GPT-4 在查询科学文献方面的性能:指导说明的可重复性、准确性和影响。
BMC Med Res Methodol. 2024 Jun 25;24(1):139. doi: 10.1186/s12874-024-02253-y.
10
Assessing Risk of Bias Using ChatGPT-4 and Cochrane ROB2 Tool.使用ChatGPT-4和Cochrane ROB2工具评估偏倚风险
Med Sci Educ. 2024 Apr 5;34(3):691-694. doi: 10.1007/s40670-024-02034-8. eCollection 2024 Jun.