Lewis Lane, Pitkow Xaq, Wehbe Leila
Neuroscience Institute, Carnegie Mellon University.
Machine Learning Dept., Carnegie Mellon University.
bioRxiv. 2025 Jun 3:2025.06.02.657514. doi: 10.1101/2025.06.02.657514.
How does the brain process language over time? Research suggests that natural human language is processed hierarchically across brain regions over time. However, attempts to characterize this computation have thus far been limited to tightly controlled experimental settings that capture only a coarse picture of the brain dynamics underlying human natural language comprehension. The recent emergence of LLM encoding models promises a new avenue to discover and characterize rich semantic information in the brain, yet interpretable methods for linking information in LLMs to language processing over time are limited. In this work, we develop a low-rank tensor regression method to decompose LLM encoding models into interpretable components of semantics, time, and brain region activation, and apply the method to a Magnetoencephalography (MEG) dataset in which subjects listened to narrative stories. With only a few components, we show improved performance compared to a standard ridge regression encoding model, suggesting the low-rank models provide a good inductive bias for language encoding. In addition, our method discovers a diverse spectrum of interpretable response components that are sensitive to a rich set of low-level and semantic language features, showing that our method is able to separate distinct language processing features in neural signals. After controlling for low-level audio and sentence features, we demonstrate better capture of semantic features. Through use of low-rank tensor encoding models we are able to decompose neural responses to language features, showing improved encoding performance and interpretable processing components, suggesting our method as a useful tool for uncovering language processes in naturalistic settings.
随着时间的推移,大脑是如何处理语言的?研究表明,人类的自然语言是随着时间在大脑区域中分层处理的。然而,迄今为止,试图描述这种计算过程的尝试仅限于严格控制的实验环境,这些环境只能捕捉到人类自然语言理解背后大脑动态的粗略图景。最近出现的语言模型(LLM)编码模型有望为发现和描述大脑中丰富的语义信息提供一条新途径,但将LLM中的信息与随时间变化的语言处理联系起来的可解释方法却很有限。在这项工作中,我们开发了一种低秩张量回归方法,将LLM编码模型分解成语义、时间和脑区激活的可解释成分,并将该方法应用于一个脑磁图(MEG)数据集,在该数据集中,受试者聆听叙事故事。通过仅使用几个成分,我们展示了与标准岭回归编码模型相比更高的性能,这表明低秩模型为语言编码提供了良好的归纳偏差。此外,我们的方法发现了各种各样对丰富的低级和语义语言特征敏感的可解释响应成分,表明我们的方法能够在神经信号中分离出不同的语言处理特征。在控制了低级音频和句子特征之后,我们展示了对语义特征更好的捕捉。通过使用低秩张量编码模型,我们能够分解对语言特征的神经反应,展示出更高的编码性能和可解释的处理成分,这表明我们的方法是在自然环境中揭示语言处理过程的有用工具。