Department of Mathematics, Western University, London, Ontario, Canada; Fields Laboratory for Network Science, Fields Institute, Toronto, Ontario, Canada.
Department of Philosophy, University of California at San Diego, San Diego, CA, USA.
Trends Neurosci. 2024 Oct;47(10):788-802. doi: 10.1016/j.tins.2024.08.006. Epub 2024 Sep 27.
The capabilities of transformer networks such as ChatGPT and other large language models (LLMs) have captured the world's attention. The crucial computational mechanism underlying their performance relies on transforming a complete input sequence - for example, all the words in a sentence - into a long 'encoding vector' that allows transformers to learn long-range temporal dependencies in naturalistic sequences. Specifically, 'self-attention' applied to this encoding vector enhances temporal context in transformers by computing associations between pairs of words in the input sequence. We suggest that waves of neural activity traveling across single cortical areas, or multiple regions on the whole-brain scale, could implement a similar encoding principle. By encapsulating recent input history into a single spatial pattern at each moment in time, cortical waves may enable a temporal context to be extracted from sequences of sensory inputs, the same computational principle as that used in transformers.
Transformer 网络(如 ChatGPT 和其他大型语言模型 (LLMs))的能力引起了全世界的关注。它们性能的关键计算机制依赖于将完整的输入序列(例如,句子中的所有单词)转换成长的“编码向量”,使转换器能够学习自然序列中的长程时间依赖性。具体来说,应用于该编码向量的“自注意力”通过计算输入序列中单词对之间的关联来增强转换器中的时间上下文。我们认为,在单个皮质区域或整个大脑尺度上的多个区域上传播的神经活动波可以实现类似的编码原理。通过在每个时间点将最近的输入历史封装到单个空间模式中,皮质波可以从感觉输入序列中提取时间上下文,这与转换器中使用的计算原理相同。