Suppr超能文献

大语言模型从整合跨越位置关联的指数窗口过渡到结构关联的幂律窗口。

Large language models transition from integrating across position-yoked, exponential windows to structure-yoked, power-law windows.

作者信息

Skrill David, Norman-Haignere Sam V

机构信息

Department of Biostatistics and Computational Biology, University of Rochester Medical Center, Rochester, NY 14642.

Depts. of Biostatistics and Computational Biology, Neuroscience, University of Rochester Medical Center, Rochester, NY 14642.

出版信息

Adv Neural Inf Process Syst. 2023 Dec;36:638-654.

Abstract

Modern language models excel at integrating across long temporal scales needed to encode linguistic meaning and show non-trivial similarities to biological neural systems. Prior work suggests that human brain responses to language exhibit hierarchically organized "integration windows" that substantially constrain the overall influence of an input token (e.g., a word) on the neural response. However, little prior work has attempted to use integration windows to characterize computations in large language models (LLMs). We developed a simple word-swap procedure for estimating integration windows from black-box language models that does not depend on access to gradients or knowledge of the model architecture (e.g., attention weights). Using this method, we show that trained LLMs exhibit stereotyped integration windows that are well-fit by a convex combination of an exponential and a power-law function, with a partial transition from exponential to power-law dynamics across network layers. We then introduce a metric for quantifying the extent to which these integration windows vary with structural boundaries (e.g., sentence boundaries), and using this metric, we show that integration windows become increasingly yoked to structure at later network layers. None of these findings were observed in an untrained model, which as expected integrated uniformly across its input. These results suggest that LLMs learn to integrate information in natural language using a stereotyped pattern: integrating across position-yoked, exponential windows at early layers, followed by structure-yoked, power-law windows at later layers. The methods we describe in this paper provide a general-purpose toolkit for understanding temporal integration in language models, facilitating cross-disciplinary research at the intersection of biological and artificial intelligence.

摘要

现代语言模型擅长在编码语言意义所需的长时间尺度上进行整合,并显示出与生物神经系统的显著相似性。先前的研究表明,人类大脑对语言的反应表现出层次组织的“整合窗口”,这些窗口极大地限制了输入令牌(例如一个单词)对神经反应的总体影响。然而,之前很少有研究尝试使用整合窗口来描述大语言模型(LLMs)中的计算。我们开发了一种简单的单词替换程序,用于从黑箱语言模型中估计整合窗口,该程序不依赖于梯度访问或模型架构知识(例如注意力权重)。使用这种方法,我们表明经过训练的大语言模型表现出刻板的整合窗口,这些窗口可以通过指数函数和幂律函数的凸组合很好地拟合,并且在网络层之间存在从指数动态到幂律动态的部分转变。然后,我们引入了一种度量标准,用于量化这些整合窗口随结构边界(例如句子边界)变化的程度,使用这个度量标准,我们表明整合窗口在网络的后期层越来越与结构相关联。在未训练的模型中没有观察到这些结果,正如预期的那样,未训练模型在其输入上进行均匀整合。这些结果表明,大语言模型学会使用一种刻板模式来整合自然语言中的信息:在早期层通过与位置相关的指数窗口进行整合,随后在后期层通过与结构相关的幂律窗口进行整合。我们在本文中描述的方法提供了一个通用工具包,用于理解语言模型中的时间整合,促进生物和人工智能交叉领域的跨学科研究。

相似文献

本文引用的文献

2
Computational Language Modeling and the Promise of In Silico Experimentation.计算语言建模与计算机模拟实验的前景。
Neurobiol Lang (Camb). 2024 Apr 1;5(1):80-106. doi: 10.1162/nol_a_00101. eCollection 2024.
8
Emergent linguistic structure in artificial neural networks trained by self-supervision.自我监督训练的人工神经网络中的紧急语言结构。
Proc Natl Acad Sci U S A. 2020 Dec 1;117(48):30046-30054. doi: 10.1073/pnas.1907367117. Epub 2020 Jun 3.
9
Constructing and Forgetting Temporal Context in the Human Cerebral Cortex.在人类大脑皮层中构建和遗忘时间上下文。
Neuron. 2020 May 20;106(4):675-686.e11. doi: 10.1016/j.neuron.2020.02.013. Epub 2020 Mar 11.
10
SciPy 1.0: fundamental algorithms for scientific computing in Python.SciPy 1.0:Python 中的科学计算基础算法。
Nat Methods. 2020 Mar;17(3):261-272. doi: 10.1038/s41592-019-0686-2. Epub 2020 Feb 3.

文献AI研究员

20分钟写一篇综述,助力文献阅读效率提升50倍。

立即体验

用中文搜PubMed

大模型驱动的PubMed中文搜索引擎

马上搜索

文档翻译

学术文献翻译模型,支持多种主流文档格式。

立即体验