精心设计大语言模型管道，能够从综合资料和数据库中以专家级水平检索循证信息。

Careful design of Large Language Model pipelines enables expert-level retrieval of evidence-based information from syntheses and databases.

作者信息

Iyer Radhika, Christie Alec Philip, Madhavapeddy Anil, Reynolds Sam, Sutherland William, Jaffer Sadiq

机构信息

Department of Zoology, University of Cambridge, Cambridge United Kingdom.

Centre for Environmental Policy, Imperial College London, United Kingdom.

出版信息

PLoS One. 2025 May 15;20(5):e0323563. doi: 10.1371/journal.pone.0323563. eCollection 2025.

DOI:10.1371/journal.pone.0323563

PMID:40373077

原文链接:https://pmc.ncbi.nlm.nih.gov/articles/PMC12080840/

Abstract

Wise use of evidence to support efficient conservation action is key to tackling biodiversity loss with limited time and resources. Evidence syntheses provide key recommendations for conservation decision-makers by assessing and summarising evidence, but are not always easy to access, digest, and use. Recent advances in Large Language Models (LLMs) present both opportunities and risks in enabling faster and more intuitive systems to access evidence syntheses and databases. Such systems for natural language search and open-ended evidence-based responses are pipelines comprising many components. Most critical of these components are the LLM used and how evidence is retrieved from the database. We evaluate the performance of ten LLMs across six different database retrieval strategies against human experts in answering synthetic multiple-choice question exams on the effects of conservation interventions using the Conservation Evidence database. We found that LLM performance was comparable with human experts over 45 filtered questions, both in correctly answering them and retrieving the document used to generate them. Across 1867 unfiltered questions, LLM performance demonstrated a level of conservation-specific knowledge, but this varied across topic areas. A hybrid retrieval strategy that combines keywords and vector embeddings performed best by a substantial margin. We also tested against a state-of-the-art previous generation LLM which was outperformed by all ten current models - including smaller, cheaper models. Our findings suggest that, with careful domain-specific design, LLMs could potentially be powerful tools for enabling expert-level use of evidence syntheses and databases in different disciplines. However, general LLMs used 'out-of-the-box' are likely to perform poorly and misinform decision-makers. By establishing that LLMs exhibit comparable performance with human synthesis experts on providing restricted responses to queries of evidence syntheses and databases, future work can build on our approach to quantify LLM performance in providing open-ended responses.

摘要

明智地运用证据来支持高效的保护行动，是在时间和资源有限的情况下应对生物多样性丧失的关键。证据综合通过评估和总结证据，为保护决策者提供关键建议，但这些建议并非总是易于获取、理解和运用。大语言模型（LLMs）的最新进展，在实现更快、更直观的系统以访问证据综合和数据库方面，既带来了机遇，也带来了风险。这种用于自然语言搜索和基于证据的开放式回答的系统是由许多组件组成的管道。这些组件中最关键的是所使用的大语言模型以及从数据库中检索证据的方式。我们针对人类专家，评估了十种大语言模型在六种不同数据库检索策略下，回答使用保护证据数据库的关于保护干预效果的综合多项选择题考试的表现。我们发现，在45个经过筛选的问题上，大语言模型的表现与人类专家相当，在正确回答问题以及检索用于生成问题的文档方面都是如此。在1867个未经过筛选的问题中，大语言模型的表现展示了一定程度的特定于保护领域的知识，但这在不同主题领域有所不同。一种结合了关键词和向量嵌入的混合检索策略以较大优势表现最佳。我们还与一个先前的先进一代大语言模型进行了测试比较，该模型被所有十个当前模型超越，包括更小、更便宜的模型。我们的研究结果表明，通过精心的特定领域设计，大语言模型有可能成为强大的工具，使不同学科的专家能够利用证据综合和数据库。然而，未经调整直接使用的通用大语言模型可能表现不佳，并误导决策者。通过确定大语言模型在对证据综合和数据库查询提供受限回答方面与人类综合专家表现相当，未来的工作可以基于我们的方法来量化大语言模型在提供开放式回答方面的表现。

https://cdn.ncbi.nlm.nih.gov/pmc/blobs/8534/12080840/3eded714b049/pone.0323563.g001.jpg

相似文献

Careful design of Large Language Model pipelines enables expert-level retrieval of evidence-based information from syntheses and databases.

PLoS One. 2025 May 15;20(5):e0323563. doi: 10.1371/journal.pone.0323563. eCollection 2025.

Improving Dietary Supplement Information Retrieval: Development of a Retrieval-Augmented Generation System With Large Language Models.

J Med Internet Res. 2025 Mar 19;27:e67677. doi: 10.2196/67677.

Optimizing biomedical information retrieval with a keyword frequency-driven prompt enhancement strategy.

BMC Bioinformatics. 2024 Aug 27;25(1):281. doi: 10.1186/s12859-024-05902-7.

Using Large Language Models to Automate Data Extraction From Surgical Pathology Reports: Retrospective Cohort Study.

JMIR Form Res. 2025 Apr 7;9:e64544. doi: 10.2196/64544.

Large Language Model Applications for Health Information Extraction in Oncology: Scoping Review.

JMIR Cancer. 2025 Mar 28;11:e65984. doi: 10.2196/65984.

BiomedRAG: A retrieval augmented large language model for biomedicine.

J Biomed Inform. 2025 Feb;162:104769. doi: 10.1016/j.jbi.2024.104769. Epub 2025 Jan 13.

Detecting emergencies in patient portal messages using large language models and knowledge graph-based retrieval-augmented generation.

J Am Med Inform Assoc. 2025 Jun 1;32(6):1032-1039. doi: 10.1093/jamia/ocaf059.

Evaluating the effectiveness of biomedical fine-tuning for large language models on clinical tasks.

J Am Med Inform Assoc. 2025 Jun 1;32(6):1015-1024. doi: 10.1093/jamia/ocaf045.

Using Generative Artificial Intelligence in Health Economics and Outcomes Research: A Primer on Techniques and Breakthroughs.

Pharmacoecon Open. 2025 Apr 29. doi: 10.1007/s41669-025-00580-4.

Enhancing Large Language Models with Retrieval-Augmented Generation: A Radiology-Specific Approach.

Radiol Artif Intell. 2025 May;7(3):e240313. doi: 10.1148/ryai.240313.

引用本文的文献

Large language models in clinical nutrition: an overview of its applications, capabilities, limitations, and potential future prospects.

Front Nutr. 2025 Aug 7;12:1635682. doi: 10.3389/fnut.2025.1635682. eCollection 2025.

本文引用的文献

Accuracy and consistency of publicly available Large Language Models as clinical decision support tools for the management of colon cancer.

J Surg Oncol. 2024 Oct;130(5):1104-1110. doi: 10.1002/jso.27821. Epub 2024 Aug 19.

Evaluation and mitigation of the limitations of large language models in clinical decision-making.

Nat Med. 2024 Sep;30(9):2613-2622. doi: 10.1038/s41591-024-03097-1. Epub 2024 Jul 4.

Leveraging AI to improve evidence synthesis in conservation.

Trends Ecol Evol. 2024 Jun;39(6):548-557. doi: 10.1016/j.tree.2024.04.007. Epub 2024 May 24.

Systematic analysis of ChatGPT, Google search and Llama 2 for clinical decision support tasks.

Nat Commun. 2024 Mar 6;15(1):2050. doi: 10.1038/s41467-024-46411-8.

Black Box Warning: Large Language Models and the Future of Infectious Diseases Consultation.

Clin Infect Dis. 2024 Apr 10;78(4):860-866. doi: 10.1093/cid/ciad633.

Environmental evidence in action: on the science and practice of evidence synthesis and evidence-based decision-making.

Environ Evid. 2023;12(1):10. doi: 10.1186/s13750-023-00302-5. Epub 2023 May 18.

ChatGPT and the rise of large language models: the new AI-driven infodemic threat in public health.

Front Public Health. 2023 Apr 25;11:1166120. doi: 10.3389/fpubh.2023.1166120. eCollection 2023.

AI chatbots not yet ready for clinical use.

Front Digit Health. 2023 Apr 12;5:1161098. doi: 10.3389/fdgth.2023.1161098. eCollection 2023.

ChatGPT and Other Large Language Models Are Double-edged Swords.

Radiology. 2023 Apr;307(2):e230163. doi: 10.1148/radiol.230163. Epub 2023 Jan 26.

Mitigating the impact of biased artificial intelligence in emergency decision-making.

Commun Med (Lond). 2022 Nov 21;2(1):149. doi: 10.1038/s43856-022-00214-4.

文献AI研究员

20分钟写一篇综述，助力文献阅读效率提升50倍。

立即体验

用中文搜PubMed

大模型驱动的PubMed中文搜索引擎

马上搜索

文档翻译

学术文献翻译模型，支持多种主流文档格式。

立即体验

精心设计大语言模型管道，能够从综合资料和数据库中以专家级水平检索循证信息。

Careful design of Large Language Model pipelines enables expert-level retrieval of evidence-based information from syntheses and databases.

作者信息

Iyer Radhika, Christie Alec Philip, Madhavapeddy Anil, Reynolds Sam, Sutherland William, Jaffer Sadiq

机构信息

Department of Zoology, University of Cambridge, Cambridge United Kingdom.

Centre for Environmental Policy, Imperial College London, United Kingdom.

出版信息

PLoS One. 2025 May 15;20(5):e0323563. doi: 10.1371/journal.pone.0323563. eCollection 2025.

DOI:10.1371/journal.pone.0323563

PMID:40373077

原文链接:https://pmc.ncbi.nlm.nih.gov/articles/PMC12080840/

Abstract

摘要

精心设计大语言模型管道，能够从综合资料和数据库中以专家级水平检索循证信息。

Careful design of Large Language Model pipelines enables expert-level retrieval of evidence-based information from syntheses and databases.

作者信息

机构信息

出版信息

相似文献

引用本文的文献

本文引用的文献

文献AI研究员

用中文搜PubMed

文档翻译

Suppr 超能文献

精心设计大语言模型管道，能够从综合资料和数据库中以专家级水平检索循证信息。

Careful design of Large Language Model pipelines enables expert-level retrieval of evidence-based information from syntheses and databases.

作者信息

机构信息

出版信息

相似文献

引用本文的文献

本文引用的文献