Wang Yao, Xue Tiantian, Yang Xingyu
Cognitive Science and Allied Health School, Beijing Language and Culture University, Beijing, China.
Institute of Life and Health Sciences, Beijing Language and Culture University, Beijing, China.
Front Neurosci. 2025 Jul 30;19:1656519. doi: 10.3389/fnins.2025.1656519. eCollection 2025.
Contextual embeddings-a core component of large language models (LLMs) that generate dynamic vector representations capturing words' semantic properties-have demonstrated structural similarities to brain activity patterns at the single-word level. This alignment supports the theoretical framework proposing vector-based neural coding for natural language processing in the brain, where linguistic units may be represented as context-sensitive vectors analogous to LLM-derived embeddings. Building on this framework, we hypothesize that cumulative distance metrics between contextual embeddings of adjacent linguistic units (words/Chinese characters) in sentence contexts may quantitatively reflect neural activation intensity during reading comprehension.
Using large-scale EEG datasets collected during reading tasks, we systematically investigated the relationship between these computationally derived distance features and frequency-specific band power measures associated with neural activity.
In conclusion, gamma-band power exhibited associations with various NLP features in the ChineseEEG dataset, whereas no comparable gamma-specific effects were observed in the ZuCo1.0 dataset. Additionally, significant effects were found in other frequency bands for both datasets.
The mixed yet intriguing results invite a deeper discussion of the directional associations (positive/negative) observed in Gamma and other frequency bands, their cognitive implications, and the potential influence of textual characteristics on these findings. While observed effects may be somehow text- or dataset- dependent, our analyses revealed associations between various distance metrics and neural responses, consistent with predictions derived from the vector-based neural coding framework.
上下文嵌入——大语言模型(LLMs)的核心组成部分,它生成捕捉单词语义属性的动态向量表示——已在单字层面展现出与大脑活动模式的结构相似性。这种一致性支持了为大脑中的自然语言处理提出基于向量的神经编码的理论框架,在该框架中,语言单元可表示为类似于基于大语言模型得出的嵌入的上下文敏感向量。基于此框架,我们假设句子语境中相邻语言单元(单词/汉字)的上下文嵌入之间的累积距离度量可能定量反映阅读理解过程中的神经激活强度。
利用在阅读任务期间收集的大规模脑电图数据集,我们系统地研究了这些通过计算得出的距离特征与与神经活动相关的特定频率带功率测量值之间的关系。
总之,在中文脑电图数据集中,伽马波段功率与各种自然语言处理特征存在关联,而在ZuCo1.0数据集中未观察到类似的特定于伽马的效应。此外,在两个数据集的其他频段也发现了显著效应。
这些复杂而有趣的结果引发了对在伽马和其他频段观察到的方向性关联(正/负)、它们的认知意义以及文本特征对这些发现的潜在影响的更深入讨论。虽然观察到的效应可能在某种程度上依赖于文本或数据集,但我们的分析揭示了各种距离度量与神经反应之间的关联,这与基于向量的神经编码框架得出的预测一致。