• 文献检索
  • 文档翻译
  • 深度研究
  • 学术资讯
  • Suppr Zotero 插件Zotero 插件
  • 邀请有礼
  • 套餐&价格
  • 历史记录
应用&插件
Suppr Zotero 插件Zotero 插件浏览器插件Mac 客户端Windows 客户端微信小程序
定价
高级版会员购买积分包购买API积分包
服务
文献检索文档翻译深度研究API 文档MCP 服务
关于我们
关于 Suppr公司介绍联系我们用户协议隐私条款
关注我们

Suppr 超能文献

核心技术专利:CN118964589B侵权必究
粤ICP备2023148730 号-1Suppr @ 2026

文献检索

告别复杂PubMed语法,用中文像聊天一样搜索,搜遍4000万医学文献。AI智能推荐,让科研检索更轻松。

立即免费搜索

文件翻译

保留排版,准确专业,支持PDF/Word/PPT等文件格式,支持 12+语言互译。

免费翻译文档

深度研究

AI帮你快速写综述,25分钟生成高质量综述,智能提取关键信息,辅助科研写作。

立即免费体验

大语言模型在预测神经科学结果方面超越了人类专家。

Large language models surpass human experts in predicting neuroscience results.

作者信息

Luo Xiaoliang, Rechardt Akilles, Sun Guangzhi, Nejad Kevin K, Yáñez Felipe, Yilmaz Bati, Lee Kangjoo, Cohen Alexandra O, Borghesani Valentina, Pashkov Anton, Marinazzo Daniele, Nicholas Jonathan, Salatiello Alessandro, Sucholutsky Ilia, Minervini Pasquale, Razavi Sepehr, Rocca Roberta, Yusifov Elkhan, Okalova Tereza, Gu Nianlong, Ferianc Martin, Khona Mikail, Patil Kaustubh R, Lee Pui-Shee, Mata Rui, Myers Nicholas E, Bizley Jennifer K, Musslick Sebastian, Bilgin Isil Poyraz, Niso Guiomar, Ales Justin M, Gaebler Michael, Ratan Murty N Apurva, Loued-Khenissi Leyla, Behler Anna, Hall Chloe M, Dafflon Jessica, Bao Sherry Dongqi, Love Bradley C

机构信息

Department of Experimental Psychology, University College London, London, UK.

Department of Engineering, University of Cambridge, Cambridge, UK.

出版信息

Nat Hum Behav. 2025 Feb;9(2):305-315. doi: 10.1038/s41562-024-02046-9. Epub 2024 Nov 27.

DOI:10.1038/s41562-024-02046-9
PMID:39604572
原文链接:https://pmc.ncbi.nlm.nih.gov/articles/PMC11860209/
Abstract

Scientific discoveries often hinge on synthesizing decades of research, a task that potentially outstrips human information processing capacities. Large language models (LLMs) offer a solution. LLMs trained on the vast scientific literature could potentially integrate noisy yet interrelated findings to forecast novel results better than human experts. Here, to evaluate this possibility, we created BrainBench, a forward-looking benchmark for predicting neuroscience results. We find that LLMs surpass experts in predicting experimental outcomes. BrainGPT, an LLM we tuned on the neuroscience literature, performed better yet. Like human experts, when LLMs indicated high confidence in their predictions, their responses were more likely to be correct, which presages a future where LLMs assist humans in making discoveries. Our approach is not neuroscience specific and is transferable to other knowledge-intensive endeavours.

摘要

科学发现往往取决于对数十年研究的综合,这一任务可能超出人类信息处理能力。大语言模型(LLMs)提供了一种解决方案。在海量科学文献上训练的大语言模型有可能整合有噪声但相互关联的发现,从而比人类专家更好地预测新结果。在此,为了评估这种可能性,我们创建了BrainBench,这是一个用于预测神经科学结果的前瞻性基准。我们发现大语言模型在预测实验结果方面超过了专家。我们在神经科学文献上微调的大语言模型BrainGPT表现更佳。与人类专家一样,当大语言模型对其预测表示高度自信时,它们的回答更有可能是正确的,这预示着大语言模型协助人类进行发现的未来。我们的方法并非特定于神经科学,而是可转移到其他知识密集型工作中。

https://cdn.ncbi.nlm.nih.gov/pmc/blobs/ae8d/11860209/8b8ec4c5b7e6/41562_2024_2046_Fig5_HTML.jpg
https://cdn.ncbi.nlm.nih.gov/pmc/blobs/ae8d/11860209/db200978a64d/41562_2024_2046_Fig1_HTML.jpg
https://cdn.ncbi.nlm.nih.gov/pmc/blobs/ae8d/11860209/79ab175cfb94/41562_2024_2046_Fig2_HTML.jpg
https://cdn.ncbi.nlm.nih.gov/pmc/blobs/ae8d/11860209/13963eb3eeca/41562_2024_2046_Fig3_HTML.jpg
https://cdn.ncbi.nlm.nih.gov/pmc/blobs/ae8d/11860209/7deb72c179ae/41562_2024_2046_Fig4_HTML.jpg
https://cdn.ncbi.nlm.nih.gov/pmc/blobs/ae8d/11860209/8b8ec4c5b7e6/41562_2024_2046_Fig5_HTML.jpg
https://cdn.ncbi.nlm.nih.gov/pmc/blobs/ae8d/11860209/db200978a64d/41562_2024_2046_Fig1_HTML.jpg
https://cdn.ncbi.nlm.nih.gov/pmc/blobs/ae8d/11860209/79ab175cfb94/41562_2024_2046_Fig2_HTML.jpg
https://cdn.ncbi.nlm.nih.gov/pmc/blobs/ae8d/11860209/13963eb3eeca/41562_2024_2046_Fig3_HTML.jpg
https://cdn.ncbi.nlm.nih.gov/pmc/blobs/ae8d/11860209/7deb72c179ae/41562_2024_2046_Fig4_HTML.jpg
https://cdn.ncbi.nlm.nih.gov/pmc/blobs/ae8d/11860209/8b8ec4c5b7e6/41562_2024_2046_Fig5_HTML.jpg

相似文献

1
Large language models surpass human experts in predicting neuroscience results.大语言模型在预测神经科学结果方面超越了人类专家。
Nat Hum Behav. 2025 Feb;9(2):305-315. doi: 10.1038/s41562-024-02046-9. Epub 2024 Nov 27.
2
A dataset and benchmark for hospital course summarization with adapted large language models.一个用于医院病程总结的数据集和基准测试,采用了适配的大语言模型。
J Am Med Inform Assoc. 2025 Mar 1;32(3):470-479. doi: 10.1093/jamia/ocae312.
3
Large Language Models Can Enable Inductive Thematic Analysis of a Social Media Corpus in a Single Prompt: Human Validation Study.大语言模型可通过单一提示实现社交媒体语料库的归纳主题分析:人类验证研究。
JMIR Infodemiology. 2024 Aug 29;4:e59641. doi: 10.2196/59641.
4
Large language models for conducting systematic reviews: on the rise, but not yet ready for use-a scoping review.用于进行系统评价的大型语言模型:正在兴起,但尚未准备好投入使用——一项范围综述
J Clin Epidemiol. 2025 May;181:111746. doi: 10.1016/j.jclinepi.2025.111746. Epub 2025 Feb 26.
5
Effectiveness of Transformer-Based Large Language Models in Identifying Adverse Drug Reaction Relations from Unstructured Discharge Summaries in Singapore.基于Transformer的大语言模型在识别新加坡非结构化出院小结中的药物不良反应关系方面的有效性。
Drug Saf. 2025 Jun;48(6):667-677. doi: 10.1007/s40264-025-01525-w. Epub 2025 Feb 21.
6
A comprehensive evaluation of large Language models on benchmark biomedical text processing tasks.对基准生物医学文本处理任务中大型语言模型的全面评估。
Comput Biol Med. 2024 Mar;171:108189. doi: 10.1016/j.compbiomed.2024.108189. Epub 2024 Feb 20.
7
Careful design of Large Language Model pipelines enables expert-level retrieval of evidence-based information from syntheses and databases.精心设计大语言模型管道,能够从综合资料和数据库中以专家级水平检索循证信息。
PLoS One. 2025 May 15;20(5):e0323563. doi: 10.1371/journal.pone.0323563. eCollection 2025.
8
Laypeople's Use of and Attitudes Toward Large Language Models and Search Engines for Health Queries: Survey Study.非专业人士使用大语言模型和搜索引擎进行健康查询的情况及态度:调查研究
J Med Internet Res. 2025 Feb 13;27:e64290. doi: 10.2196/64290.
9
AI in Home Care-Evaluation of Large Language Models for Future Training of Informal Caregivers: Observational Comparative Case Study.家庭护理中的人工智能——对用于未来非正式护理人员培训的大语言模型的评估:观察性比较案例研究
J Med Internet Res. 2025 Apr 28;27:e70703. doi: 10.2196/70703.
10
Empowering large language models for automated clinical assessment with generation-augmented retrieval and hierarchical chain-of-thought.通过生成增强检索和分层思维链赋能大型语言模型进行自动化临床评估。
Artif Intell Med. 2025 Apr;162:103078. doi: 10.1016/j.artmed.2025.103078. Epub 2025 Feb 12.

引用本文的文献

1
Most prominent challenges in translational neuroscience and strategic solutions to bridge the gaps: Perspectives from an editorial board interrogation.转化神经科学中最突出的挑战以及弥合差距的战略解决方案:编辑委员会访谈观点
Explor Neurosci. 2025;4. doi: 10.37349/en.2025.1006106. Epub 2025 Aug 12.
2
An Intelligent Infrastructure as a Foundation for Modern Science.作为现代科学基础的智能基础设施。
ArXiv. 2025 Aug 12:arXiv:2508.10051v1.
3
Can LLMs effectively assist medical coding? Evaluating GPT performance on DRG and targeted clinical tasks.

本文引用的文献

1
Visual proteomics.视觉蛋白质组学
Nat Methods. 2023 Dec;20(12):1868. doi: 10.1038/s41592-023-02104-6.
2
A multilevel account of hippocampal function in spatial and concept learning: Bridging models of behavior and neural assemblies.多层次的海马体功能解释在空间和概念学习中:连接行为和神经集合模型。
Sci Adv. 2023 Jul 21;9(29):eade6903. doi: 10.1126/sciadv.ade6903.
3
Bayesian modeling of human-AI complementarity.贝叶斯模型的人机互补。
大语言模型能否有效辅助医学编码?评估GPT在疾病诊断相关分组及特定临床任务上的表现。
BMC Med Inform Decis Mak. 2025 Aug 19;25(1):312. doi: 10.1186/s12911-025-03151-z.
4
Facilitating analysis of open neurophysiology data on the DANDI Archive using large language model tools.使用大语言模型工具促进对DANDI存档中开放神经生理学数据的分析。
bioRxiv. 2025 Jul 24:2025.07.17.663965. doi: 10.1101/2025.07.17.663965.
5
Could machine learning help to build a unified theory of cognition?机器学习能否助力构建一个统一的认知理论?
Nature. 2025 Jul 29. doi: 10.1038/d41586-025-02353-9.
6
NiCLIP: Neuroimaging contrastive language-image pretraining model for predicting text from brain activation images.NiCLIP:用于从大脑激活图像预测文本的神经影像对比语言-图像预训练模型。
bioRxiv. 2025 Aug 2:2025.06.14.659706. doi: 10.1101/2025.06.14.659706.
7
Will AI become our Co-PI?人工智能会成为我们的共同首席研究员吗?
NPJ Digit Med. 2025 Jul 14;8(1):440. doi: 10.1038/s41746-025-01859-w.
8
Knowledge-Driven Feature Selection and Engineering for Genotype Data with Large Language Models.基于大语言模型的基因型数据知识驱动特征选择与工程
AMIA Jt Summits Transl Sci Proc. 2025 Jun 10;2025:250-259. eCollection 2025.
9
Evaluating the performance of large language & visual-language models in cervical cytology screening.评估大语言模型和视觉语言模型在宫颈细胞学筛查中的性能。
NPJ Precis Oncol. 2025 May 23;9(1):153. doi: 10.1038/s41698-025-00916-7.
10
A study of the deconstruction and construction of self-efficacy in internet use among older people.老年人互联网使用中自我效能感的解构与建构研究
BMC Geriatr. 2025 May 20;25(1):355. doi: 10.1186/s12877-025-05892-y.
Proc Natl Acad Sci U S A. 2022 Mar 15;119(11):e2111547119. doi: 10.1073/pnas.2111547119. Epub 2022 Mar 11.
4
Slowed canonical progress in large fields of science.减缓了大型科学领域的规范进展。
Proc Natl Acad Sci U S A. 2021 Oct 12;118(41). doi: 10.1073/pnas.2021636118.
5
Highly accurate protein structure prediction for the human proteome.高精准度的人类蛋白质组蛋白结构预测。
Nature. 2021 Aug;596(7873):590-596. doi: 10.1038/s41586-021-03828-1. Epub 2021 Jul 22.
6
Variability in the analysis of a single neuroimaging dataset by many teams.由多个团队对单个神经影像学数据集进行分析的可变性。
Nature. 2020 Jun;582(7810):84-88. doi: 10.1038/s41586-020-2314-9. Epub 2020 May 20.
7
Deep learning enables rapid identification of potent DDR1 kinase inhibitors.深度学习可快速鉴定有效的 DDR1 激酶抑制剂。
Nat Biotechnol. 2019 Sep;37(9):1038-1040. doi: 10.1038/s41587-019-0224-x. Epub 2019 Sep 2.
8
Unsupervised word embeddings capture latent knowledge from materials science literature.无监督词嵌入方法可以从材料科学文献中提取潜在知识。
Nature. 2019 Jul;571(7763):95-98. doi: 10.1038/s41586-019-1335-8. Epub 2019 Jul 3.
9
Gorilla in our midst: An online behavioral experiment builder.潜伏在我们中间的大猩猩:一个在线行为实验构建器。
Behav Res Methods. 2020 Feb;52(1):388-407. doi: 10.3758/s13428-019-01237-x.