LitAutoScreener：由大语言模型驱动的循证医学中自动化文献筛选工具的开发与验证

LitAutoScreener: Development and Validation of an Automated Literature Screening Tool in Evidence-Based Medicine Driven by Large Language Models.

作者信息

Tao Yiming, Li Xuehu, Yisha Zuhar, Yang Sihan, Zhan Siyan, Sun Feng

机构信息

Key Laboratory of Epidemiology of Major Diseases, Ministry of Education/Department of Epidemiology and Biostatistics, School of Public Health, Peking University, Beijing, China.

School of Cybersecurity, Hainan University, Hainan, China.

出版信息

Health Data Sci. 2025 Sep 2;5:0322. doi: 10.34133/hds.0322. eCollection 2025.

DOI:10.34133/hds.0322

PMID:40904687

原文链接:https://pmc.ncbi.nlm.nih.gov/articles/PMC12404845/

Abstract

The traditional manual literature screening approach is limited by its time-consuming nature and high labor costs. A pressing issue is how to leverage large language models to enhance the efficiency and quality of evidence-based evaluations of drug efficacy and safety. This study utilized a manually curated reference literature database-comprising vaccine, hypoglycemic agent, and antidepressant evaluation studies-previously developed by our team through conventional systematic review methods. This validated database served as the gold standard for the development and optimization of LitAutoScreener. Following the PICOS (Population, Intervention, Comparison, Outcomes, Study Design) principles, a chain-of-thought reasoning approach with few-shot learning prompts was implemented to develop the screening algorithm. We subsequently evaluated the performance of LitAutoScreener using 2 independent validation cohorts, assessing both classification accuracy and processing efficiency. For respiratory syncytial virus vaccine safety validation title-abstract screening, our tools based on GPT (GPT-4o), Kimi (moonshot-v1-128k), and DeepSeek (deepseek-chat 2.5) demonstrated high accuracy in inclusion/exclusion decisions (99.38%, 98.94%, and 98.85%, respectively). Recall rates were 100.00%, 99.13%, and 98.26%, with statistically significant performance differences ( = 5.99, = 0.048), where GPT outperformed the other models. Exclusion reason concordance rates were 98.85%, 94.79%, and 96.47% ( = 30.22, < 0.001). In full-text screening, all models maintained perfect recall (100.00%), with accuracies of 100.00% (GPT), 100.00% (Kimi), and 99.45% (DeepSeek). Processing times averaged 1 to 5 s per article for title-abstract screening and 60 s for full-text processing (including PDF preprocessing). LitAutoScreener offers a new approach for efficient literature screening in drug intervention studies, achieving high accuracy and significantly improving screening efficiency.

摘要

传统的人工文献筛选方法存在耗时且人工成本高的局限性。一个紧迫的问题是如何利用大语言模型提高药物疗效和安全性循证评估的效率和质量。本研究使用了一个人工整理的参考文献数据库，该数据库包含疫苗、降糖药和抗抑郁药评估研究，是我们团队之前通过传统系统评价方法开发的。这个经过验证的数据库作为LitAutoScreener开发和优化的金标准。遵循PICOS（人群、干预措施、对照、结局、研究设计）原则，采用带有少样本学习提示的思维链推理方法来开发筛选算法。随后，我们使用2个独立的验证队列评估了LitAutoScreener的性能，评估了分类准确性和处理效率。对于呼吸道合胞病毒疫苗安全性验证的标题-摘要筛选，我们基于GPT（GPT-4o）、Kimi（moonshot-v1-128k）和DeepSeek（deepseek-chat 2.5）的工具在纳入/排除决策中表现出高准确率（分别为99.38%、98.94%和98.85%）。召回率分别为100.00%、99.13%和98.26%，性能存在统计学显著差异（ = 5.99， = 0.048），其中GPT优于其他模型。排除原因一致性率分别为98.85%、94.79%和96.47%（ = �0.22， < 0.001）。在全文筛选中，所有模型的召回率均保持完美（100.00%），准确率分别为100.00%（GPT）、100.00%（Kimi）和99.45%（DeepSeek）。标题-摘要筛选的处理时间平均为每篇文章1至5秒，全文处理（包括PDF预处理）为60秒。LitAutoScreener为药物干预研究中的高效文献筛选提供了一种新方法，实现了高准确率并显著提高了筛选效率。

https://cdn.ncbi.nlm.nih.gov/pmc/blobs/c9ee/12404845/3ee60da8d1c3/hds.0322.fig.001.jpg

相似文献

LitAutoScreener: Development and Validation of an Automated Literature Screening Tool in Evidence-Based Medicine Driven by Large Language Models.LitAutoScreener：由大语言模型驱动的循证医学中自动化文献筛选工具的开发与验证

Health Data Sci. 2025 Sep 2;5:0322. doi: 10.34133/hds.0322. eCollection 2025.

Human-Comparable Sensitivity of Large Language Models in Identifying Eligible Studies Through Title and Abstract Screening: 3-Layer Strategy Using GPT-3.5 and GPT-4 for Systematic Reviews.大型语言模型在通过标题和摘要筛选确定合格研究方面的人类可比敏感性：使用 GPT-3.5 和 GPT-4 进行系统评价的 3 层策略。

J Med Internet Res. 2024 Aug 16;26:e52758. doi: 10.2196/52758.

Automated Paper Screening for Clinical Reviews Using Large Language Models: Data Analysis Study.使用大型语言模型对临床综述进行自动化论文筛选：数据分析研究。

J Med Internet Res. 2024 Jan 12;26:e48996. doi: 10.2196/48996.

Validation of automated paper screening for esophagectomy systematic review using large language models.使用大语言模型对食管癌切除术系统评价的自动化文献筛选进行验证。

PeerJ Comput Sci. 2025 Apr 30;11:e2822. doi: 10.7717/peerj-cs.2822. eCollection 2025.

Using a Diverse Test Suite to Assess Large Language Models on Fast Health Care Interoperability Resources Knowledge: Comparative Analysis.使用多样化测试套件在快速医疗保健互操作性资源知识方面评估大语言模型：比较分析

J Med Internet Res. 2025 Aug 12;27:e73540. doi: 10.2196/73540.

Development of a Large-Scale Dataset of Chest Computed Tomography Reports in Japanese and a High-Performance Finding Classification Model: Dataset Development and Validation Study.日语胸部计算机断层扫描报告大规模数据集的开发及高性能发现分类模型：数据集开发与验证研究

JMIR Med Inform. 2025 Aug 28;13:e71137. doi: 10.2196/71137.

Leveraging Retrieval-Augmented Large Language Models for Dietary Recommendations With Traditional Chinese Medicine's Medicine Food Homology: Algorithm Development and Validation.利用检索增强大语言模型结合中医药食同源进行饮食推荐：算法开发与验证

JMIR Med Inform. 2025 Aug 21;13:e75279. doi: 10.2196/75279.

Prescription of Controlled Substances: Benefits and Risks管制药品的处方：益处与风险

Signs and symptoms to determine if a patient presenting in primary care or hospital outpatient settings has COVID-19.在基层医疗机构或医院门诊环境中，如果患者出现以下症状和体征，可判断其是否患有 COVID-19。

Cochrane Database Syst Rev. 2022 May 20;5(5):CD013665. doi: 10.1002/14651858.CD013665.pub3.

Systemic pharmacological treatments for chronic plaque psoriasis: a network meta-analysis.慢性斑块状银屑病的全身药理学治疗：一项网状Meta分析。

Cochrane Database Syst Rev. 2020 Jan 9;1(1):CD011535. doi: 10.1002/14651858.CD011535.pub3.

本文引用的文献

Performance of a Large Language Model in Screening Citations.大语言模型在引文筛选中的表现。

JAMA Netw Open. 2024 Jul 1;7(7):e2420496. doi: 10.1001/jamanetworkopen.2024.20496.

Sensitivity and Specificity of Using GPT-3.5 Turbo Models for Title and Abstract Screening in Systematic Reviews and Meta-analyses.使用 GPT-3.5 Turbo 模型进行系统评价和荟萃分析的标题和摘要筛选的灵敏度和特异性。

Ann Intern Med. 2024 Jun;177(6):791-799. doi: 10.7326/M23-3389. Epub 2024 May 21.

Large language models in medicine.医学中的大型语言模型。

Nat Med. 2023 Aug;29(8):1930-1940. doi: 10.1038/s41591-023-02448-8. Epub 2023 Jul 17.

ChatGPT and the rise of large language models: the new AI-driven infodemic threat in public health.ChatGPT 和大型语言模型的兴起：公共卫生领域新的 AI 驱动的信息疫情威胁。

Front Public Health. 2023 Apr 25;11:1166120. doi: 10.3389/fpubh.2023.1166120. eCollection 2023.

The PRISMA 2020 statement: An updated guideline for reporting systematic reviews.《PRISMA 2020声明：报告系统评价的更新指南》

J Clin Epidemiol. 2021 Jun;134:178-189. doi: 10.1016/j.jclinepi.2021.03.001. Epub 2021 Mar 29.

Not all systematic reviews can be completed in 2 weeks-But many can be (and should be).并非所有的系统评价都能在两周内完成，但许多系统评价是可以（而且应该）在两周内完成的。

J Clin Epidemiol. 2020 Oct;126:163. doi: 10.1016/j.jclinepi.2020.06.035. Epub 2020 Jul 1.

A 24-step guide on how to design, conduct, and successfully publish a systematic review and meta-analysis in medical research.医学研究中系统评价和荟萃分析的设计、实施和成功发表的 24 步指南

Eur J Epidemiol. 2020 Jan;35(1):49-60. doi: 10.1007/s10654-019-00576-5. Epub 2019 Nov 13.

Time-to-update of systematic reviews relative to the availability of new evidence.系统评价相对于新证据的更新时间。

Syst Rev. 2018 Nov 17;7(1):195. doi: 10.1186/s13643-018-0856-9.

Systematic Review of the Literature: Best Practices.系统文献综述：最佳实践。

Acad Radiol. 2018 Nov;25(11):1481-1490. doi: 10.1016/j.acra.2018.04.025. Epub 2018 Jul 31.

Real-World Evidence and Real-World Data for Evaluating Drug Safety and Effectiveness.用于评估药物安全性和有效性的真实世界证据与真实世界数据。

JAMA. 2018 Sep 4;320(9):867-868. doi: 10.1001/jama.2018.10136.

文献检索

告别复杂PubMed语法，用中文像聊天一样搜索，搜遍4000万医学文献。AI智能推荐，让科研检索更轻松。

立即免费搜索

文件翻译

保留排版，准确专业，支持PDF/Word/PPT等文件格式，支持 12+语言互译。

免费翻译文档

深度研究

AI帮你快速写综述，25分钟生成高质量综述，智能提取关键信息，辅助科研写作。

立即免费体验

LitAutoScreener：由大语言模型驱动的循证医学中自动化文献筛选工具的开发与验证

LitAutoScreener: Development and Validation of an Automated Literature Screening Tool in Evidence-Based Medicine Driven by Large Language Models.

作者信息

机构信息

出版信息

相似文献

本文引用的文献

文献检索

文件翻译

深度研究

Suppr 超能文献

相似文献

本文引用的文献