Sandbrink Kai, Bauer Jan P, Proca Alexandra M, Saxe Andrew M, Summerfield Christopher, Hummos Ali
Exp. Psychology, Oxford Brain Mind Institute, EPFL.
ELSC, HebrewU Gatsby Unit, UCL.
ArXiv. 2025 Jan 16:arXiv:2411.03840v2.
Animals survive in dynamic environments changing at arbitrary timescales, but such data distribution shifts are a challenge to neural networks. To adapt to change, neural systems may change a large number of parameters, which is a slow process involving past information. In contrast, animals leverage distribution changes to segment their stream of experience into tasks and associate them with internal task abstractions. Animals can then respond by selecting the appropriate task abstraction. However, how such flexible task abstractions may arise in neural systems remains unknown. Here, we analyze a linear gated network where the weights and gates are jointly optimized via gradient descent, but with neuron-like constraints on the gates including a faster timescale, nonnegativity, and bounded activity. We observe that the weights self-organize into modules specialized for tasks or sub-tasks encountered, while the gates layer forms unique representations that switch the appropriate weight modules (task abstractions). We analytically reduce the learning dynamics to an effective eigenspace, revealing a virtuous cycle: fast adapting gates drive weight specialization by protecting previous knowledge, while weight specialization in turn increases the update rate of the gating layer. Task switching in the gating layer accelerates as a function of curriculum block size and task training, mirroring key findings in cognitive neuroscience. We show that the discovered task abstractions support generalization through both task and subtask composition, and we extend our findings to a non-linear network switching between two tasks. Overall, our work offers a theory of cognitive flexibility in animals as arising from joint gradient descent on synaptic and neural gating in a neural network architecture.
动物在任意时间尺度变化的动态环境中生存,但这种数据分布的变化对神经网络来说是一个挑战。为了适应变化,神经系统可能会改变大量参数,这是一个涉及过去信息的缓慢过程。相比之下,动物利用分布变化将它们的经验流划分为不同任务,并将这些任务与内部任务抽象联系起来。然后,动物可以通过选择合适的任务抽象来做出反应。然而,这种灵活的任务抽象在神经系统中是如何产生的仍然未知。在这里,我们分析了一个线性门控网络,其中权重和门控通过梯度下降进行联合优化,但对门控有类似神经元的约束,包括更快的时间尺度、非负性和有界活动。我们观察到,权重会自组织成专门用于处理所遇到任务或子任务的模块,而门控层则形成独特的表示,用于切换合适的权重模块(任务抽象)。我们通过分析将学习动态简化为一个有效的特征空间,揭示了一个良性循环:快速适应的门控通过保护先前的知识来驱动权重专业化,而权重专业化反过来又提高了门控层的更新率。门控层中的任务切换会随着课程块大小和任务训练而加速,这与认知神经科学的关键发现相呼应。我们表明,所发现的任务抽象通过任务和子任务组合来支持泛化,并且我们将研究结果扩展到了在两个任务之间切换的非线性网络。总的来说,我们的工作为动物认知灵活性提供了一种理论,这种灵活性源于神经网络架构中突触和神经门控的联合梯度下降。