• 文献检索
  • 文档翻译
  • 深度研究
  • 学术资讯
  • Suppr Zotero 插件Zotero 插件
  • 邀请有礼
  • 套餐&价格
  • 历史记录
应用&插件
Suppr Zotero 插件Zotero 插件浏览器插件Mac 客户端Windows 客户端微信小程序
定价
高级版会员购买积分包购买API积分包
服务
文献检索文档翻译深度研究API 文档MCP 服务
关于我们
关于 Suppr公司介绍联系我们用户协议隐私条款
关注我们

Suppr 超能文献

核心技术专利:CN118964589B侵权必究
粤ICP备2023148730 号-1Suppr @ 2026

文献检索

告别复杂PubMed语法,用中文像聊天一样搜索,搜遍4000万医学文献。AI智能推荐,让科研检索更轻松。

立即免费搜索

文件翻译

保留排版,准确专业,支持PDF/Word/PPT等文件格式,支持 12+语言互译。

免费翻译文档

深度研究

AI帮你快速写综述,25分钟生成高质量综述,智能提取关键信息,辅助科研写作。

立即免费体验

基于大语言模型的自动文摘筛选的问答框架。

A question-answering framework for automated abstract screening using large language models.

机构信息

Centre for Computational Science and Mathematical Modelling, Coventry University, Coventry CV1 2TT, United Kingdom.

Information School, The University of Sheffield, Sheffield S10 2AH, United Kingdom.

出版信息

J Am Med Inform Assoc. 2024 Sep 1;31(9):1939-1952. doi: 10.1093/jamia/ocae166.

DOI:10.1093/jamia/ocae166
PMID:39042516
原文链接:https://pmc.ncbi.nlm.nih.gov/articles/PMC11339526/
Abstract

OBJECTIVE

This paper aims to address the challenges in abstract screening within systematic reviews (SR) by leveraging the zero-shot capabilities of large language models (LLMs).

METHODS

We employ LLM to prioritize candidate studies by aligning abstracts with the selection criteria outlined in an SR protocol. Abstract screening was transformed into a novel question-answering (QA) framework, treating each selection criterion as a question addressed by LLM. The framework involves breaking down the selection criteria into multiple questions, properly prompting LLM to answer each question, scoring and re-ranking each answer, and combining the responses to make nuanced inclusion or exclusion decisions.

RESULTS AND DISCUSSION

Large-scale validation was performed on the benchmark of CLEF eHealth 2019 Task 2: Technology-Assisted Reviews in Empirical Medicine. Focusing on GPT-3.5 as a case study, the proposed QA framework consistently exhibited a clear advantage over traditional information retrieval approaches and bespoke BERT-family models that were fine-tuned for prioritizing candidate studies (ie, from the BERT to PubMedBERT) across 31 datasets of 4 categories of SRs, underscoring their high potential in facilitating abstract screening. The experiments also showcased the viability of using selection criteria as a query for reference prioritization. The experiments also showcased the viability of the framework using different LLMs.

CONCLUSION

Investigation justified the indispensable value of leveraging selection criteria to improve the performance of automated abstract screening. LLMs demonstrated proficiency in prioritizing candidate studies for abstract screening using the proposed QA framework. Significant performance improvements were obtained by re-ranking answers using the semantic alignment between abstracts and selection criteria. This further highlighted the pertinence of utilizing selection criteria to enhance abstract screening.

摘要

目的

本文旨在利用大型语言模型(LLM)的零样本能力,解决系统评价(SR)中摘要筛选的挑战。

方法

我们采用 LLM 通过将摘要与 SR 方案中概述的选择标准对齐,对候选研究进行优先级排序。摘要筛选被转化为一种新的问答(QA)框架,将每个选择标准视为 LLM 回答的一个问题。该框架涉及将选择标准分解为多个问题,正确提示 LLM 回答每个问题,对每个答案进行评分和重新排序,并结合答案做出细致的纳入或排除决策。

结果与讨论

在 CLEF eHealth 2019 任务 2:实证医学中的技术辅助评价的基准上进行了大规模验证。以 GPT-3.5 为例进行研究,所提出的 QA 框架在 31 个 4 类 SR 的数据集上始终表现出明显优于传统信息检索方法和专门为候选研究优先级排序(即从 BERT 到 PubMedBERT)微调的 BERT 系列模型的优势,凸显了其在促进摘要筛选方面的巨大潜力。实验还展示了使用选择标准作为查询进行参考优先级排序的可行性。实验还展示了使用不同 LLM 的框架的可行性。

结论

调查证明了利用选择标准来提高自动化摘要筛选性能的不可或缺的价值。LLM 展示了在使用所提出的 QA 框架进行摘要筛选候选研究优先级排序方面的卓越能力。通过使用摘要和选择标准之间的语义对齐对答案进行重新排序,获得了显著的性能提升。这进一步强调了利用选择标准来增强摘要筛选的相关性。

https://cdn.ncbi.nlm.nih.gov/pmc/blobs/bf3c/11339526/8bd5f9c36341/ocae166f5.jpg
https://cdn.ncbi.nlm.nih.gov/pmc/blobs/bf3c/11339526/eb8db22768ed/ocae166f1.jpg
https://cdn.ncbi.nlm.nih.gov/pmc/blobs/bf3c/11339526/996dfdfb0c45/ocae166f2.jpg
https://cdn.ncbi.nlm.nih.gov/pmc/blobs/bf3c/11339526/87a37ae64ad3/ocae166f3.jpg
https://cdn.ncbi.nlm.nih.gov/pmc/blobs/bf3c/11339526/b662ed695921/ocae166f4.jpg
https://cdn.ncbi.nlm.nih.gov/pmc/blobs/bf3c/11339526/8bd5f9c36341/ocae166f5.jpg
https://cdn.ncbi.nlm.nih.gov/pmc/blobs/bf3c/11339526/eb8db22768ed/ocae166f1.jpg
https://cdn.ncbi.nlm.nih.gov/pmc/blobs/bf3c/11339526/996dfdfb0c45/ocae166f2.jpg
https://cdn.ncbi.nlm.nih.gov/pmc/blobs/bf3c/11339526/87a37ae64ad3/ocae166f3.jpg
https://cdn.ncbi.nlm.nih.gov/pmc/blobs/bf3c/11339526/b662ed695921/ocae166f4.jpg
https://cdn.ncbi.nlm.nih.gov/pmc/blobs/bf3c/11339526/8bd5f9c36341/ocae166f5.jpg

相似文献

1
A question-answering framework for automated abstract screening using large language models.基于大语言模型的自动文摘筛选的问答框架。
J Am Med Inform Assoc. 2024 Sep 1;31(9):1939-1952. doi: 10.1093/jamia/ocae166.
2
One LLM is not Enough: Harnessing the Power of Ensemble Learning for Medical Question Answering.一个语言模型是不够的:利用集成学习的力量进行医学问答。
medRxiv. 2023 Dec 24:2023.12.21.23300380. doi: 10.1101/2023.12.21.23300380.
3
Evaluating the effectiveness of large language models in abstract screening: a comparative analysis.评估大型语言模型在摘要筛选中的有效性:一项对比分析。
Syst Rev. 2024 Aug 21;13(1):219. doi: 10.1186/s13643-024-02609-x.
4
Automated Paper Screening for Clinical Reviews Using Large Language Models: Data Analysis Study.使用大型语言模型对临床综述进行自动化论文筛选:数据分析研究。
J Med Internet Res. 2024 Jan 12;26:e48996. doi: 10.2196/48996.
5
Can large language models replace humans in systematic reviews? Evaluating GPT-4's efficacy in screening and extracting data from peer-reviewed and grey literature in multiple languages.大型语言模型能否在系统评价中取代人类?评估 GPT-4 从多种语言的同行评议文献和灰色文献中进行筛选和提取数据的效果。
Res Synth Methods. 2024 Jul;15(4):616-626. doi: 10.1002/jrsm.1715. Epub 2024 Mar 14.
6
Evaluating the Capabilities of Generative AI Tools in Understanding Medical Papers: Qualitative Study.评估生成式人工智能工具理解医学论文的能力:定性研究
JMIR Med Inform. 2024 Sep 4;12:e59258. doi: 10.2196/59258.
7
Optimizing biomedical information retrieval with a keyword frequency-driven prompt enhancement strategy.基于关键词频率驱动的提示增强策略优化生物医学信息检索
BMC Bioinformatics. 2024 Aug 27;25(1):281. doi: 10.1186/s12859-024-05902-7.
8
Natural language processing was effective in assisting rapid title and abstract screening when updating systematic reviews.自然语言处理在更新系统评价时,有助于快速进行标题和摘要筛选。
J Clin Epidemiol. 2021 May;133:121-129. doi: 10.1016/j.jclinepi.2021.01.010. Epub 2021 Jan 21.
9
Examining the Role of Large Language Models in Orthopedics: Systematic Review.检查大型语言模型在骨科中的作用:系统评价。
J Med Internet Res. 2024 Nov 15;26:e59607. doi: 10.2196/59607.
10
Folic acid supplementation and malaria susceptibility and severity among people taking antifolate antimalarial drugs in endemic areas.在流行地区,服用抗叶酸抗疟药物的人群中,叶酸补充剂与疟疾易感性和严重程度的关系。
Cochrane Database Syst Rev. 2022 Feb 1;2(2022):CD014217. doi: 10.1002/14651858.CD014217.

引用本文的文献

1
The emergence of large language models as tools in literature reviews: a large language model-assisted systematic review.大语言模型作为文献综述工具的出现:一项大语言模型辅助的系统综述
J Am Med Inform Assoc. 2025 Jun 1;32(6):1071-1086. doi: 10.1093/jamia/ocaf063.
2
Large language models in biomedicine and health: current research landscape and future directions.生物医学与健康领域的大语言模型:当前研究现状与未来方向
J Am Med Inform Assoc. 2024 Sep 1;31(9):1801-1811. doi: 10.1093/jamia/ocae202.

本文引用的文献

1
Automated Paper Screening for Clinical Reviews Using Large Language Models: Data Analysis Study.使用大型语言模型对临床综述进行自动化论文筛选:数据分析研究。
J Med Internet Res. 2024 Jan 12;26:e48996. doi: 10.2196/48996.
2
Large language models encode clinical knowledge.大语言模型编码临床知识。
Nature. 2023 Aug;620(7972):172-180. doi: 10.1038/s41586-023-06291-2. Epub 2023 Jul 12.
3
How Does ChatGPT Perform on the United States Medical Licensing Examination (USMLE)? The Implications of Large Language Models for Medical Education and Knowledge Assessment.
ChatGPT在美国医师执照考试(USMLE)中的表现如何?大语言模型对医学教育和知识评估的影响。
JMIR Med Educ. 2023 Feb 8;9:e45312. doi: 10.2196/45312.
4
The rationale behind systematic reviews in clinical medicine: a conceptual framework.临床医学系统评价背后的基本原理:一个概念框架。
J Diabetes Metab Disord. 2021 Apr 8;20(1):919-929. doi: 10.1007/s40200-021-00773-8. eCollection 2021 Jun.
5
Trialstreamer: A living, automatically updated database of clinical trial reports.Trialstreamer:一个实时更新的临床试验报告数据库。
J Am Med Inform Assoc. 2020 Dec 9;27(12):1903-1912. doi: 10.1093/jamia/ocaa163.
6
BioBERT: a pre-trained biomedical language representation model for biomedical text mining.BioBERT:一种用于生物医学文本挖掘的预训练生物医学语言表示模型。
Bioinformatics. 2020 Feb 15;36(4):1234-1240. doi: 10.1093/bioinformatics/btz682.
7
The significant cost of systematic reviews and meta-analyses: A call for greater involvement of machine learning to assess the promise of clinical trials.系统评价和荟萃分析的高昂成本:呼吁机器学习更多地参与评估临床试验的前景。
Contemp Clin Trials Commun. 2019 Aug 25;16:100443. doi: 10.1016/j.conctc.2019.100443. eCollection 2019 Dec.
8
Toward systematic review automation: a practical guide to using machine learning tools in research synthesis.迈向系统评价自动化:在研究综合中使用机器学习工具的实用指南。
Syst Rev. 2019 Jul 11;8(1):163. doi: 10.1186/s13643-019-1074-9.
9
A question of trust: can we build an evidence base to gain trust in systematic review automation technologies?信任的问题:我们能否建立一个证据基础,以获得对系统评价自动化技术的信任?
Syst Rev. 2019 Jun 18;8(1):143. doi: 10.1186/s13643-019-1062-0.
10
Automating Biomedical Evidence Synthesis: RobotReviewer.生物医学证据合成自动化:机器人审阅者
Proc Conf Assoc Comput Linguist Meet. 2017 Jul;2017:7-12. doi: 10.18653/v1/P17-4002.