Suppr超能文献

LitAutoScreener:由大语言模型驱动的循证医学中自动化文献筛选工具的开发与验证

LitAutoScreener: Development and Validation of an Automated Literature Screening Tool in Evidence-Based Medicine Driven by Large Language Models.

作者信息

Tao Yiming, Li Xuehu, Yisha Zuhar, Yang Sihan, Zhan Siyan, Sun Feng

机构信息

Key Laboratory of Epidemiology of Major Diseases, Ministry of Education/Department of Epidemiology and Biostatistics, School of Public Health, Peking University, Beijing, China.

School of Cybersecurity, Hainan University, Hainan, China.

出版信息

Health Data Sci. 2025 Sep 2;5:0322. doi: 10.34133/hds.0322. eCollection 2025.

Abstract

The traditional manual literature screening approach is limited by its time-consuming nature and high labor costs. A pressing issue is how to leverage large language models to enhance the efficiency and quality of evidence-based evaluations of drug efficacy and safety. This study utilized a manually curated reference literature database-comprising vaccine, hypoglycemic agent, and antidepressant evaluation studies-previously developed by our team through conventional systematic review methods. This validated database served as the gold standard for the development and optimization of LitAutoScreener. Following the PICOS (Population, Intervention, Comparison, Outcomes, Study Design) principles, a chain-of-thought reasoning approach with few-shot learning prompts was implemented to develop the screening algorithm. We subsequently evaluated the performance of LitAutoScreener using 2 independent validation cohorts, assessing both classification accuracy and processing efficiency. For respiratory syncytial virus vaccine safety validation title-abstract screening, our tools based on GPT (GPT-4o), Kimi (moonshot-v1-128k), and DeepSeek (deepseek-chat 2.5) demonstrated high accuracy in inclusion/exclusion decisions (99.38%, 98.94%, and 98.85%, respectively). Recall rates were 100.00%, 99.13%, and 98.26%, with statistically significant performance differences ( = 5.99, = 0.048), where GPT outperformed the other models. Exclusion reason concordance rates were 98.85%, 94.79%, and 96.47% ( = 30.22, < 0.001). In full-text screening, all models maintained perfect recall (100.00%), with accuracies of 100.00% (GPT), 100.00% (Kimi), and 99.45% (DeepSeek). Processing times averaged 1 to 5 s per article for title-abstract screening and 60 s for full-text processing (including PDF preprocessing). LitAutoScreener offers a new approach for efficient literature screening in drug intervention studies, achieving high accuracy and significantly improving screening efficiency.

摘要

传统的人工文献筛选方法存在耗时且人工成本高的局限性。一个紧迫的问题是如何利用大语言模型提高药物疗效和安全性循证评估的效率和质量。本研究使用了一个人工整理的参考文献数据库,该数据库包含疫苗、降糖药和抗抑郁药评估研究,是我们团队之前通过传统系统评价方法开发的。这个经过验证的数据库作为LitAutoScreener开发和优化的金标准。遵循PICOS(人群、干预措施、对照、结局、研究设计)原则,采用带有少样本学习提示的思维链推理方法来开发筛选算法。随后,我们使用2个独立的验证队列评估了LitAutoScreener的性能,评估了分类准确性和处理效率。对于呼吸道合胞病毒疫苗安全性验证的标题-摘要筛选,我们基于GPT(GPT-4o)、Kimi(moonshot-v1-128k)和DeepSeek(deepseek-chat 2.5)的工具在纳入/排除决策中表现出高准确率(分别为99.38%、98.94%和98.85%)。召回率分别为100.00%、99.13%和98.26%,性能存在统计学显著差异( = 5.99, = 0.048),其中GPT优于其他模型。排除原因一致性率分别为98.85%、94.79%和96.47%( = �0.22, < 0.001)。在全文筛选中,所有模型的召回率均保持完美(100.00%),准确率分别为100.00%(GPT)、100.00%(Kimi)和99.45%(DeepSeek)。标题-摘要筛选的处理时间平均为每篇文章1至5秒,全文处理(包括PDF预处理)为60秒。LitAutoScreener为药物干预研究中的高效文献筛选提供了一种新方法,实现了高准确率并显著提高了筛选效率。

https://cdn.ncbi.nlm.nih.gov/pmc/blobs/c9ee/12404845/3ee60da8d1c3/hds.0322.fig.001.jpg

文献检索

告别复杂PubMed语法,用中文像聊天一样搜索,搜遍4000万医学文献。AI智能推荐,让科研检索更轻松。

立即免费搜索

文件翻译

保留排版,准确专业,支持PDF/Word/PPT等文件格式,支持 12+语言互译。

免费翻译文档

深度研究

AI帮你快速写综述,25分钟生成高质量综述,智能提取关键信息,辅助科研写作。

立即免费体验