Singh Chandan, Antonello Richard J, Guo Sihang, Mischler Gavin, Gao Jianfeng, Mesgarani Nima, Huth Alexander G
Microsoft Research, Redmond, WA, USA.
Electrical Engineering Department, Columbia University, NY, USA.
bioRxiv. 2025 Aug 12:2025.08.12.669958. doi: 10.1101/2025.08.12.669958.
Modern data-driven encoding models are highly effective at predicting brain responses to language stimuli. However, these models struggle to the underlying phenomena, i.e. what features of the stimulus drive the response? We present Question Answering encoding models, a method for converting qualitative theories of language selectivity into highly accurate, interpretable models of brain responses. QA encoding models annotate a language stimulus by using a large language model to answer yes-no questions corresponding to qualitative theories. A compact QA encoding model that uses only 35 questions outperforms existing baselines at predicting brain responses in both fMRI and ECoG data. The model weights also provide easily interpretable maps of language selectivity across cortex; these maps show quantitative agreement with meta-analyses of the existing literature and selectivity maps identified in a follow-up fMRI experiment. These results demonstrate that LLMs can bridge the widening gap between qualitative scientific theories and data-driven models.
现代数据驱动的编码模型在预测大脑对语言刺激的反应方面非常有效。然而,这些模型难以理解潜在现象,即刺激的哪些特征驱动了反应?我们提出了问答编码模型,这是一种将语言选择性的定性理论转化为高度准确、可解释的大脑反应模型的方法。问答编码模型通过使用大语言模型回答与定性理论相对应的是非问题来注释语言刺激。一个仅使用35个问题的紧凑问答编码模型在预测功能磁共振成像(fMRI)和皮层脑电图(ECoG)数据中的大脑反应方面优于现有的基线模型。模型权重还提供了整个皮层易于解释的语言选择性图谱;这些图谱显示与现有文献的荟萃分析以及后续fMRI实验中确定的选择性图谱在数量上一致。这些结果表明,大语言模型可以弥合定性科学理论与数据驱动模型之间不断扩大的差距。