基于变压器的语言模型和人类大脑的功能专业化共享。

Shared functional specialization in transformer-based language models and the human brain.

机构信息

Princeton Neuroscience Institute, Princeton University, Princeton, NJ, 08540, USA.

Department of Computer Science, Princeton University, Princeton, NJ, 08540, USA.

出版信息

Nat Commun. 2024 Jun 29;15(1):5523. doi: 10.1038/s41467-024-49173-5.

DOI:10.1038/s41467-024-49173-5

PMID:38951520

原文链接:https://pmc.ncbi.nlm.nih.gov/articles/PMC11217339/

Abstract

When processing language, the brain is thought to deploy specialized computations to construct meaning from complex linguistic structures. Recently, artificial neural networks based on the Transformer architecture have revolutionized the field of natural language processing. Transformers integrate contextual information across words via structured circuit computations. Prior work has focused on the internal representations ("embeddings") generated by these circuits. In this paper, we instead analyze the circuit computations directly: we deconstruct these computations into the functionally-specialized "transformations" that integrate contextual information across words. Using functional MRI data acquired while participants listened to naturalistic stories, we first verify that the transformations account for considerable variance in brain activity across the cortical language network. We then demonstrate that the emergent computations performed by individual, functionally-specialized "attention heads" differentially predict brain activity in specific cortical regions. These heads fall along gradients corresponding to different layers and context lengths in a low-dimensional cortical space.

摘要

在处理语言时，大脑被认为会部署专门的计算来从复杂的语言结构中构建意义。最近，基于转换器架构的人工神经网络彻底改变了自然语言处理领域。转换器通过结构化电路计算在单词之间集成上下文信息。之前的工作主要集中在这些电路生成的内部表示（“嵌入”）上。在本文中，我们转而直接分析电路计算：我们将这些计算分解为功能专门化的“转换”，这些转换在单词之间集成上下文信息。我们使用参与者在听自然故事时采集的功能磁共振成像数据，首先验证这些转换可以解释皮质语言网络中大脑活动的大量差异。然后，我们证明单个功能专门化的“注意力头”执行的涌现计算可以不同程度地预测特定皮质区域的大脑活动。这些头沿着与低维皮质空间中的不同层和上下文长度相对应的梯度分布。

https://cdn.ncbi.nlm.nih.gov/pmc/blobs/68b0/11217339/404e2cf32fd5/41467_2024_49173_Fig1_HTML.jpg

相似文献

Shared functional specialization in transformer-based language models and the human brain.

Nat Commun. 2024 Jun 29;15(1):5523. doi: 10.1038/s41467-024-49173-5.

Neural Encoding and Decoding With Distributed Sentence Representations.

IEEE Trans Neural Netw Learn Syst. 2021 Feb;32(2):589-603. doi: 10.1109/TNNLS.2020.3027595. Epub 2021 Feb 4.

Deep Artificial Neural Networks Reveal a Distributed Cortical Network Encoding Propositional Sentence-Level Meaning.

J Neurosci. 2021 May 5;41(18):4100-4119. doi: 10.1523/JNEUROSCI.1152-20.2021. Epub 2021 Mar 22.

A robust dissociation among the language, multiple demand, and default mode networks: Evidence from inter-region correlations in effect size.

Neuropsychologia. 2018 Oct;119:501-511. doi: 10.1016/j.neuropsychologia.2018.09.011. Epub 2018 Sep 20.

Individual word representations dissociate from linguistic context along a cortical unimodal to heteromodal gradient.

Hum Brain Mapp. 2024 Feb 1;45(2):e26607. doi: 10.1002/hbm.26607.

Domain-General Brain Regions Do Not Track Linguistic Input as Closely as Language-Selective Regions.

J Neurosci. 2017 Oct 11;37(41):9999-10011. doi: 10.1523/JNEUROSCI.3642-16.2017. Epub 2017 Sep 4.

Behavioral correlates of cortical semantic representations modeled by word vectors.

PLoS Comput Biol. 2021 Jun 23;17(6):e1009138. doi: 10.1371/journal.pcbi.1009138. eCollection 2021 Jun.

A CNN-transformer hybrid approach for decoding visual neural activity into text.

Comput Methods Programs Biomed. 2022 Feb;214:106586. doi: 10.1016/j.cmpb.2021.106586. Epub 2021 Dec 14.

The neural architecture of language: Integrative modeling converges on predictive processing.

Proc Natl Acad Sci U S A. 2021 Nov 9;118(45). doi: 10.1073/pnas.2105646118.

Predicting "When" in Discourse Engages the Human Dorsal Auditory Stream: An fMRI Study Using Naturalistic Stories.

J Neurosci. 2016 Nov 30;36(48):12180-12191. doi: 10.1523/JNEUROSCI.4100-15.2016.

引用本文的文献

A Deep Neural Network Trained on Congruent Audiovisual Speech Reports the McGurk Effect.

bioRxiv. 2025 Aug 24:2025.08.20.671347. doi: 10.1101/2025.08.20.671347.

Emergence of a temporal processing gradient from naturalistic inputs and network connectivity.

Proc Natl Acad Sci U S A. 2025 Jul 15;122(28):e2420105122. doi: 10.1073/pnas.2420105122. Epub 2025 Jul 9.

Cortical language areas are coupled via a soft hierarchy of model-based linguistic features.

bioRxiv. 2025 Jun 3:2025.06.02.657491. doi: 10.1101/2025.06.02.657491.

Cortical representational geometry of diverse tasks reveals subject-specific and subject-invariant cognitive structures.

Commun Biol. 2025 May 8;8(1):713. doi: 10.1038/s42003-025-08134-4.

Natural language processing models reveal neural dynamics of human conversation.

Nat Commun. 2025 Apr 9;16(1):3376. doi: 10.1038/s41467-025-58620-w.

Linguistic coupling between neural systems for speech production and comprehension during real-time dyadic conversations.

bioRxiv. 2025 Feb 16:2025.02.14.638276. doi: 10.1101/2025.02.14.638276.

Big data approaches for novel mechanistic insights on sleep and circadian rhythms: a workshop summary.

Sleep. 2025 Jun 13;48(6). doi: 10.1093/sleep/zsaf035.

Approximating the semantic space: word embedding techniques in psychiatric speech analysis.

Schizophrenia (Heidelb). 2024 Dec 2;10(1):114. doi: 10.1038/s41537-024-00524-7.

A shared model-based linguistic space for transmitting our thoughts from brain to brain in natural conversations.

Neuron. 2024 Sep 25;112(18):3211-3222.e5. doi: 10.1016/j.neuron.2024.06.025. Epub 2024 Aug 2.

本文引用的文献

Predictive Coding or Just Feature Discovery? An Alternative Account of Why Language Models Fit Brain Data.

Neurobiol Lang (Camb). 2024 Apr 1;5(1):64-79. doi: 10.1162/nol_a_00087. eCollection 2024.

Finding structure during incremental speech comprehension.

Elife. 2024 Apr 5;12:RP89311. doi: 10.7554/eLife.89311.

Dissociating language and thought in large language models.

Trends Cogn Sci. 2024 Jun;28(6):517-540. doi: 10.1016/j.tics.2024.01.011. Epub 2024 Mar 19.

Dissecting neural computations in the human auditory pathway using deep neural networks for speech.

Nat Neurosci. 2023 Dec;26(12):2213-2225. doi: 10.1038/s41593-023-01468-4. Epub 2023 Oct 30.

A natural language fMRI dataset for voxelwise encoding models.

Sci Data. 2023 Aug 23;10(1):555. doi: 10.1038/s41597-023-02437-z.

Modeling Structure-Building in the Brain With CCG Parsing and Large Language Models.

Cogn Sci. 2023 Jul;47(7):e13312. doi: 10.1111/cogs.13312.

Evidence of a predictive coding hierarchy in the human brain listening to speech.

Nat Hum Behav. 2023 Mar;7(3):430-441. doi: 10.1038/s41562-022-01516-2. Epub 2023 Mar 2.

Information flow across the cortical timescale hierarchy during narrative construction.

Proc Natl Acad Sci U S A. 2022 Dec 20;119(51):e2209307119. doi: 10.1073/pnas.2209307119. Epub 2022 Dec 12.

Feature-space selection with banded ridge regression.

Neuroimage. 2022 Dec 1;264:119728. doi: 10.1016/j.neuroimage.2022.119728. Epub 2022 Nov 8.

Deep language algorithms predict semantic comprehension from brain activity.

Sci Rep. 2022 Sep 29;12(1):16327. doi: 10.1038/s41598-022-20460-9.

文献AI研究员

20分钟写一篇综述，助力文献阅读效率提升50倍。

立即体验

用中文搜PubMed

大模型驱动的PubMed中文搜索引擎

马上搜索

文档翻译

学术文献翻译模型，支持多种主流文档格式。

立即体验

基于变压器的语言模型和人类大脑的功能专业化共享。

Shared functional specialization in transformer-based language models and the human brain.

机构信息

Princeton Neuroscience Institute, Princeton University, Princeton, NJ, 08540, USA.

Department of Computer Science, Princeton University, Princeton, NJ, 08540, USA.

出版信息

Nat Commun. 2024 Jun 29;15(1):5523. doi: 10.1038/s41467-024-49173-5.

DOI:10.1038/s41467-024-49173-5

PMID:38951520

原文链接:https://pmc.ncbi.nlm.nih.gov/articles/PMC11217339/

Abstract

摘要

基于变压器的语言模型和人类大脑的功能专业化共享。

Shared functional specialization in transformer-based language models and the human brain.

机构信息

出版信息

相似文献

引用本文的文献

本文引用的文献

文献AI研究员

用中文搜PubMed

文档翻译

Suppr 超能文献

基于变压器的语言模型和人类大脑的功能专业化共享。

Shared functional specialization in transformer-based language models and the human brain.

机构信息

出版信息

相似文献

引用本文的文献

本文引用的文献