大语言模型在预测神经科学结果方面超越了人类专家。

Large language models surpass human experts in predicting neuroscience results.

作者信息

Luo Xiaoliang, Rechardt Akilles, Sun Guangzhi, Nejad Kevin K, Yáñez Felipe, Yilmaz Bati, Lee Kangjoo, Cohen Alexandra O, Borghesani Valentina, Pashkov Anton, Marinazzo Daniele, Nicholas Jonathan, Salatiello Alessandro, Sucholutsky Ilia, Minervini Pasquale, Razavi Sepehr, Rocca Roberta, Yusifov Elkhan, Okalova Tereza, Gu Nianlong, Ferianc Martin, Khona Mikail, Patil Kaustubh R, Lee Pui-Shee, Mata Rui, Myers Nicholas E, Bizley Jennifer K, Musslick Sebastian, Bilgin Isil Poyraz, Niso Guiomar, Ales Justin M, Gaebler Michael, Ratan Murty N Apurva, Loued-Khenissi Leyla, Behler Anna, Hall Chloe M, Dafflon Jessica, Bao Sherry Dongqi, Love Bradley C

机构信息

Department of Experimental Psychology, University College London, London, UK.

Department of Engineering, University of Cambridge, Cambridge, UK.

出版信息

Nat Hum Behav. 2025 Feb;9(2):305-315. doi: 10.1038/s41562-024-02046-9. Epub 2024 Nov 27.

DOI:10.1038/s41562-024-02046-9

PMID:39604572

原文链接:https://pmc.ncbi.nlm.nih.gov/articles/PMC11860209/

Abstract

Scientific discoveries often hinge on synthesizing decades of research, a task that potentially outstrips human information processing capacities. Large language models (LLMs) offer a solution. LLMs trained on the vast scientific literature could potentially integrate noisy yet interrelated findings to forecast novel results better than human experts. Here, to evaluate this possibility, we created BrainBench, a forward-looking benchmark for predicting neuroscience results. We find that LLMs surpass experts in predicting experimental outcomes. BrainGPT, an LLM we tuned on the neuroscience literature, performed better yet. Like human experts, when LLMs indicated high confidence in their predictions, their responses were more likely to be correct, which presages a future where LLMs assist humans in making discoveries. Our approach is not neuroscience specific and is transferable to other knowledge-intensive endeavours.

摘要

科学发现往往取决于对数十年研究的综合，这一任务可能超出人类信息处理能力。大语言模型（LLMs）提供了一种解决方案。在海量科学文献上训练的大语言模型有可能整合有噪声但相互关联的发现，从而比人类专家更好地预测新结果。在此，为了评估这种可能性，我们创建了BrainBench，这是一个用于预测神经科学结果的前瞻性基准。我们发现大语言模型在预测实验结果方面超过了专家。我们在神经科学文献上微调的大语言模型BrainGPT表现更佳。与人类专家一样，当大语言模型对其预测表示高度自信时，它们的回答更有可能是正确的，这预示着大语言模型协助人类进行发现的未来。我们的方法并非特定于神经科学，而是可转移到其他知识密集型工作中。

https://cdn.ncbi.nlm.nih.gov/pmc/blobs/ae8d/11860209/db200978a64d/41562_2024_2046_Fig1_HTML.jpg

相似文献

Large language models surpass human experts in predicting neuroscience results.

Nat Hum Behav. 2025 Feb;9(2):305-315. doi: 10.1038/s41562-024-02046-9. Epub 2024 Nov 27.

A dataset and benchmark for hospital course summarization with adapted large language models.

J Am Med Inform Assoc. 2025 Mar 1;32(3):470-479. doi: 10.1093/jamia/ocae312.

Large Language Models Can Enable Inductive Thematic Analysis of a Social Media Corpus in a Single Prompt: Human Validation Study.

JMIR Infodemiology. 2024 Aug 29;4:e59641. doi: 10.2196/59641.

Large language models for conducting systematic reviews: on the rise, but not yet ready for use-a scoping review.

J Clin Epidemiol. 2025 May;181:111746. doi: 10.1016/j.jclinepi.2025.111746. Epub 2025 Feb 26.

Effectiveness of Transformer-Based Large Language Models in Identifying Adverse Drug Reaction Relations from Unstructured Discharge Summaries in Singapore.

Drug Saf. 2025 Jun;48(6):667-677. doi: 10.1007/s40264-025-01525-w. Epub 2025 Feb 21.

A comprehensive evaluation of large Language models on benchmark biomedical text processing tasks.

Comput Biol Med. 2024 Mar;171:108189. doi: 10.1016/j.compbiomed.2024.108189. Epub 2024 Feb 20.

Careful design of Large Language Model pipelines enables expert-level retrieval of evidence-based information from syntheses and databases.

PLoS One. 2025 May 15;20(5):e0323563. doi: 10.1371/journal.pone.0323563. eCollection 2025.

Laypeople's Use of and Attitudes Toward Large Language Models and Search Engines for Health Queries: Survey Study.

J Med Internet Res. 2025 Feb 13;27:e64290. doi: 10.2196/64290.

AI in Home Care-Evaluation of Large Language Models for Future Training of Informal Caregivers: Observational Comparative Case Study.

J Med Internet Res. 2025 Apr 28;27:e70703. doi: 10.2196/70703.

Empowering large language models for automated clinical assessment with generation-augmented retrieval and hierarchical chain-of-thought.

Artif Intell Med. 2025 Apr;162:103078. doi: 10.1016/j.artmed.2025.103078. Epub 2025 Feb 12.

引用本文的文献

Most prominent challenges in translational neuroscience and strategic solutions to bridge the gaps: Perspectives from an editorial board interrogation.

Explor Neurosci. 2025;4. doi: 10.37349/en.2025.1006106. Epub 2025 Aug 12.

An Intelligent Infrastructure as a Foundation for Modern Science.

ArXiv. 2025 Aug 12:arXiv:2508.10051v1.

Can LLMs effectively assist medical coding? Evaluating GPT performance on DRG and targeted clinical tasks.

BMC Med Inform Decis Mak. 2025 Aug 19;25(1):312. doi: 10.1186/s12911-025-03151-z.

Facilitating analysis of open neurophysiology data on the DANDI Archive using large language model tools.

bioRxiv. 2025 Jul 24:2025.07.17.663965. doi: 10.1101/2025.07.17.663965.

Could machine learning help to build a unified theory of cognition?

Nature. 2025 Jul 29. doi: 10.1038/d41586-025-02353-9.

NiCLIP: Neuroimaging contrastive language-image pretraining model for predicting text from brain activation images.

bioRxiv. 2025 Aug 2:2025.06.14.659706. doi: 10.1101/2025.06.14.659706.

Will AI become our Co-PI?

NPJ Digit Med. 2025 Jul 14;8(1):440. doi: 10.1038/s41746-025-01859-w.

Knowledge-Driven Feature Selection and Engineering for Genotype Data with Large Language Models.

AMIA Jt Summits Transl Sci Proc. 2025 Jun 10;2025:250-259. eCollection 2025.

Evaluating the performance of large language & visual-language models in cervical cytology screening.

NPJ Precis Oncol. 2025 May 23;9(1):153. doi: 10.1038/s41698-025-00916-7.

A study of the deconstruction and construction of self-efficacy in internet use among older people.

BMC Geriatr. 2025 May 20;25(1):355. doi: 10.1186/s12877-025-05892-y.

本文引用的文献

Visual proteomics.

Nat Methods. 2023 Dec;20(12):1868. doi: 10.1038/s41592-023-02104-6.

A multilevel account of hippocampal function in spatial and concept learning: Bridging models of behavior and neural assemblies.

Sci Adv. 2023 Jul 21;9(29):eade6903. doi: 10.1126/sciadv.ade6903.

Bayesian modeling of human-AI complementarity.

Proc Natl Acad Sci U S A. 2022 Mar 15;119(11):e2111547119. doi: 10.1073/pnas.2111547119. Epub 2022 Mar 11.

Slowed canonical progress in large fields of science.

Proc Natl Acad Sci U S A. 2021 Oct 12;118(41). doi: 10.1073/pnas.2021636118.

Highly accurate protein structure prediction for the human proteome.

Nature. 2021 Aug;596(7873):590-596. doi: 10.1038/s41586-021-03828-1. Epub 2021 Jul 22.

Variability in the analysis of a single neuroimaging dataset by many teams.

Nature. 2020 Jun;582(7810):84-88. doi: 10.1038/s41586-020-2314-9. Epub 2020 May 20.

Deep learning enables rapid identification of potent DDR1 kinase inhibitors.

Nat Biotechnol. 2019 Sep;37(9):1038-1040. doi: 10.1038/s41587-019-0224-x. Epub 2019 Sep 2.

Unsupervised word embeddings capture latent knowledge from materials science literature.

Nature. 2019 Jul;571(7763):95-98. doi: 10.1038/s41586-019-1335-8. Epub 2019 Jul 3.

Gorilla in our midst: An online behavioral experiment builder.

Behav Res Methods. 2020 Feb;52(1):388-407. doi: 10.3758/s13428-019-01237-x.

文献AI研究员

20分钟写一篇综述，助力文献阅读效率提升50倍。

立即体验

用中文搜PubMed

大模型驱动的PubMed中文搜索引擎

马上搜索

文档翻译

学术文献翻译模型，支持多种主流文档格式。

立即体验

大语言模型在预测神经科学结果方面超越了人类专家。

Large language models surpass human experts in predicting neuroscience results.

作者信息

机构信息

出版信息

相似文献

引用本文的文献

本文引用的文献

文献AI研究员

用中文搜PubMed

文档翻译

Suppr 超能文献

相似文献

引用本文的文献

本文引用的文献