大语言模型从整合跨越位置关联的指数窗口过渡到结构关联的幂律窗口。

Large language models transition from integrating across position-yoked, exponential windows to structure-yoked, power-law windows.

作者信息

Skrill David, Norman-Haignere Sam V

机构信息

Department of Biostatistics and Computational Biology, University of Rochester Medical Center, Rochester, NY 14642.

Depts. of Biostatistics and Computational Biology, Neuroscience, University of Rochester Medical Center, Rochester, NY 14642.

出版信息

Adv Neural Inf Process Syst. 2023 Dec;36:638-654.

PMID:38434255

原文链接:https://pmc.ncbi.nlm.nih.gov/articles/PMC10907028/

Abstract

Modern language models excel at integrating across long temporal scales needed to encode linguistic meaning and show non-trivial similarities to biological neural systems. Prior work suggests that human brain responses to language exhibit hierarchically organized "integration windows" that substantially constrain the overall influence of an input token (e.g., a word) on the neural response. However, little prior work has attempted to use integration windows to characterize computations in large language models (LLMs). We developed a simple word-swap procedure for estimating integration windows from black-box language models that does not depend on access to gradients or knowledge of the model architecture (e.g., attention weights). Using this method, we show that trained LLMs exhibit stereotyped integration windows that are well-fit by a convex combination of an exponential and a power-law function, with a partial transition from exponential to power-law dynamics across network layers. We then introduce a metric for quantifying the extent to which these integration windows vary with structural boundaries (e.g., sentence boundaries), and using this metric, we show that integration windows become increasingly yoked to structure at later network layers. None of these findings were observed in an untrained model, which as expected integrated uniformly across its input. These results suggest that LLMs learn to integrate information in natural language using a stereotyped pattern: integrating across position-yoked, exponential windows at early layers, followed by structure-yoked, power-law windows at later layers. The methods we describe in this paper provide a general-purpose toolkit for understanding temporal integration in language models, facilitating cross-disciplinary research at the intersection of biological and artificial intelligence.

摘要

现代语言模型擅长在编码语言意义所需的长时间尺度上进行整合，并显示出与生物神经系统的显著相似性。先前的研究表明，人类大脑对语言的反应表现出层次组织的“整合窗口”，这些窗口极大地限制了输入令牌（例如一个单词）对神经反应的总体影响。然而，之前很少有研究尝试使用整合窗口来描述大语言模型（LLMs）中的计算。我们开发了一种简单的单词替换程序，用于从黑箱语言模型中估计整合窗口，该程序不依赖于梯度访问或模型架构知识（例如注意力权重）。使用这种方法，我们表明经过训练的大语言模型表现出刻板的整合窗口，这些窗口可以通过指数函数和幂律函数的凸组合很好地拟合，并且在网络层之间存在从指数动态到幂律动态的部分转变。然后，我们引入了一种度量标准，用于量化这些整合窗口随结构边界（例如句子边界）变化的程度，使用这个度量标准，我们表明整合窗口在网络的后期层越来越与结构相关联。在未训练的模型中没有观察到这些结果，正如预期的那样，未训练模型在其输入上进行均匀整合。这些结果表明，大语言模型学会使用一种刻板模式来整合自然语言中的信息：在早期层通过与位置相关的指数窗口进行整合，随后在后期层通过与结构相关的幂律窗口进行整合。我们在本文中描述的方法提供了一个通用工具包，用于理解语言模型中的时间整合，促进生物和人工智能交叉领域的跨学科研究。

相似文献

Large language models transition from integrating across position-yoked, exponential windows to structure-yoked, power-law windows.大语言模型从整合跨越位置关联的指数窗口过渡到结构关联的幂律窗口。

Adv Neural Inf Process Syst. 2023 Dec;36:638-654.

Understanding Adaptive, Multiscale Temporal Integration In Deep Speech Recognition Systems.理解深度语音识别系统中的自适应多尺度时间整合

Adv Neural Inf Process Syst. 2021 Dec;34:24455-24467.

Temporal integration in human auditory cortex is predominantly yoked to absolute time, not structure duration.人类听觉皮层中的时间整合主要与绝对时间相关，而非结构时长。

bioRxiv. 2024 Sep 24:2024.09.23.614358. doi: 10.1101/2024.09.23.614358.

Scale matters: Large language models with billions (rather than millions) of parameters better match neural representations of natural language.规模很重要：拥有数十亿（而非数百万）参数的大语言模型更能匹配自然语言的神经表征。

bioRxiv. 2024 Oct 16:2024.06.12.598513. doi: 10.1101/2024.06.12.598513.

Assessing the Alignment of Large Language Models With Human Values for Mental Health Integration: Cross-Sectional Study Using Schwartz's Theory of Basic Values.评估大型语言模型与人类心理健康整合价值观的一致性：使用施瓦茨基本价值观理论的横断面研究。

JMIR Ment Health. 2024 Apr 9;11:e55988. doi: 10.2196/55988.

Predicting "When" in Discourse Engages the Human Dorsal Auditory Stream: An fMRI Study Using Naturalistic Stories.预测语篇中的“何时”会激活人类背侧听觉通路：一项使用自然故事的功能磁共振成像研究。

J Neurosci. 2016 Nov 30;36(48):12180-12191. doi: 10.1523/JNEUROSCI.4100-15.2016.

Use of SNOMED CT in Large Language Models: Scoping Review.SNOMED CT 在大语言模型中的应用：范围综述。

JMIR Med Inform. 2024 Oct 7;12:e62924. doi: 10.2196/62924.

Dimensionality and Ramping: Signatures of Sentence Integration in the Dynamics of Brains and Deep Language Models.维度和渐变：大脑和深度语言模型动态中句子整合的特征。

J Neurosci. 2023 Jul 19;43(29):5350-5364. doi: 10.1523/JNEUROSCI.1163-22.2023. Epub 2023 May 22.

Individual word representations dissociate from linguistic context along a cortical unimodal to heteromodal gradient.个体单词的表示形式沿着皮质单模态到异模态梯度与语言语境分离。

Hum Brain Mapp. 2024 Feb 1;45(2):e26607. doi: 10.1002/hbm.26607.

Neural populations in the language network differ in the size of their temporal receptive windows.语言网络中的神经群体在时间感受窗的大小上存在差异。

Nat Hum Behav. 2024 Oct;8(10):1924-1942. doi: 10.1038/s41562-024-01944-2. Epub 2024 Aug 26.

引用本文的文献

Temporal integration in human auditory cortex is predominantly yoked to absolute time, not structure duration.人类听觉皮层中的时间整合主要与绝对时间相关，而非结构时长。

bioRxiv. 2024 Sep 24:2024.09.23.614358. doi: 10.1101/2024.09.23.614358.

本文引用的文献

Understanding Adaptive, Multiscale Temporal Integration In Deep Speech Recognition Systems.理解深度语音识别系统中的自适应多尺度时间整合

Adv Neural Inf Process Syst. 2021 Dec;34:24455-24467.

Computational Language Modeling and the Promise of In Silico Experimentation.计算语言建模与计算机模拟实验的前景。

Neurobiol Lang (Camb). 2024 Apr 1;5(1):80-106. doi: 10.1162/nol_a_00101. eCollection 2024.

Semantic reconstruction of continuous language from non-invasive brain recordings.从非侵入性脑记录中重建连续语言的语义。

Nat Neurosci. 2023 May;26(5):858-866. doi: 10.1038/s41593-023-01304-9. Epub 2023 May 1.

Evidence of a predictive coding hierarchy in the human brain listening to speech.人类大脑在听语音时存在预测编码层级的证据。

Nat Hum Behav. 2023 Mar;7(3):430-441. doi: 10.1038/s41562-022-01516-2. Epub 2023 Mar 2.

Multiscale temporal integration organizes hierarchical computation in human auditory cortex.多尺度时间整合在人类听觉皮层中组织分层计算。

Nat Hum Behav. 2022 Mar;6(3):455-469. doi: 10.1038/s41562-021-01261-y. Epub 2022 Feb 10.

The neural architecture of language: Integrative modeling converges on predictive processing.语言的神经结构：综合建模趋向于预测处理。

Proc Natl Acad Sci U S A. 2021 Nov 9;118(45). doi: 10.1073/pnas.2105646118.

Deep Artificial Neural Networks Reveal a Distributed Cortical Network Encoding Propositional Sentence-Level Meaning.深度人工神经网络揭示命题句级意义的分布式皮层网络编码。

J Neurosci. 2021 May 5;41(18):4100-4119. doi: 10.1523/JNEUROSCI.1152-20.2021. Epub 2021 Mar 22.

Emergent linguistic structure in artificial neural networks trained by self-supervision.自我监督训练的人工神经网络中的紧急语言结构。

Proc Natl Acad Sci U S A. 2020 Dec 1;117(48):30046-30054. doi: 10.1073/pnas.1907367117. Epub 2020 Jun 3.

Constructing and Forgetting Temporal Context in the Human Cerebral Cortex.在人类大脑皮层中构建和遗忘时间上下文。

Neuron. 2020 May 20;106(4):675-686.e11. doi: 10.1016/j.neuron.2020.02.013. Epub 2020 Mar 11.

SciPy 1.0: fundamental algorithms for scientific computing in Python.SciPy 1.0：Python 中的科学计算基础算法。

Nat Methods. 2020 Mar;17(3):261-272. doi: 10.1038/s41592-019-0686-2. Epub 2020 Feb 3.

文献AI研究员

20分钟写一篇综述，助力文献阅读效率提升50倍。

立即体验

用中文搜PubMed

大模型驱动的PubMed中文搜索引擎

马上搜索

文档翻译

学术文献翻译模型，支持多种主流文档格式。