Skrill David, Norman-Haignere Sam V
Department of Biostatistics and Computational Biology, University of Rochester Medical Center, Rochester, NY 14642.
Depts. of Biostatistics and Computational Biology, Neuroscience, University of Rochester Medical Center, Rochester, NY 14642.
Adv Neural Inf Process Syst. 2023 Dec;36:638-654.
Modern language models excel at integrating across long temporal scales needed to encode linguistic meaning and show non-trivial similarities to biological neural systems. Prior work suggests that human brain responses to language exhibit hierarchically organized "integration windows" that substantially constrain the overall influence of an input token (e.g., a word) on the neural response. However, little prior work has attempted to use integration windows to characterize computations in large language models (LLMs). We developed a simple word-swap procedure for estimating integration windows from black-box language models that does not depend on access to gradients or knowledge of the model architecture (e.g., attention weights). Using this method, we show that trained LLMs exhibit stereotyped integration windows that are well-fit by a convex combination of an exponential and a power-law function, with a partial transition from exponential to power-law dynamics across network layers. We then introduce a metric for quantifying the extent to which these integration windows vary with structural boundaries (e.g., sentence boundaries), and using this metric, we show that integration windows become increasingly yoked to structure at later network layers. None of these findings were observed in an untrained model, which as expected integrated uniformly across its input. These results suggest that LLMs learn to integrate information in natural language using a stereotyped pattern: integrating across position-yoked, exponential windows at early layers, followed by structure-yoked, power-law windows at later layers. The methods we describe in this paper provide a general-purpose toolkit for understanding temporal integration in language models, facilitating cross-disciplinary research at the intersection of biological and artificial intelligence.
现代语言模型擅长在编码语言意义所需的长时间尺度上进行整合,并显示出与生物神经系统的显著相似性。先前的研究表明,人类大脑对语言的反应表现出层次组织的“整合窗口”,这些窗口极大地限制了输入令牌(例如一个单词)对神经反应的总体影响。然而,之前很少有研究尝试使用整合窗口来描述大语言模型(LLMs)中的计算。我们开发了一种简单的单词替换程序,用于从黑箱语言模型中估计整合窗口,该程序不依赖于梯度访问或模型架构知识(例如注意力权重)。使用这种方法,我们表明经过训练的大语言模型表现出刻板的整合窗口,这些窗口可以通过指数函数和幂律函数的凸组合很好地拟合,并且在网络层之间存在从指数动态到幂律动态的部分转变。然后,我们引入了一种度量标准,用于量化这些整合窗口随结构边界(例如句子边界)变化的程度,使用这个度量标准,我们表明整合窗口在网络的后期层越来越与结构相关联。在未训练的模型中没有观察到这些结果,正如预期的那样,未训练模型在其输入上进行均匀整合。这些结果表明,大语言模型学会使用一种刻板模式来整合自然语言中的信息:在早期层通过与位置相关的指数窗口进行整合,随后在后期层通过与结构相关的幂律窗口进行整合。我们在本文中描述的方法提供了一个通用工具包,用于理解语言模型中的时间整合,促进生物和人工智能交叉领域的跨学科研究。