Babson College, Mathematics, Analytics, Science, and Technology Division, Wellesley, MA 02481, U.S.A.
University of Notre Dame, Department of Applied and Computational Mathematics and Statistics, Notre Dame, IN 46556, U.S.A.
Neural Comput. 2024 Jul 19;36(8):1568-1600. doi: 10.1162/neco_a_01681.
In computational neuroscience, recurrent neural networks are widely used to model neural activity and learning. In many studies, fixed points of recurrent neural networks are used to model neural responses to static or slowly changing stimuli, such as visual cortical responses to static visual stimuli. These applications raise the question of how to train the weights in a recurrent neural network to minimize a loss function evaluated on fixed points. In parallel, training fixed points is a central topic in the study of deep equilibrium models in machine learning. A natural approach is to use gradient descent on the Euclidean space of weights. We show that this approach can lead to poor learning performance due in part to singularities that arise in the loss surface. We use a reparameterization of the recurrent network model to derive two alternative learning rules that produce more robust learning dynamics. We demonstrate that these learning rules avoid singularities and learn more effectively than standard gradient descent. The new learning rules can be interpreted as steepest descent and gradient descent, respectively, under a non-Euclidean metric on the space of recurrent weights. Our results question the common, implicit assumption that learning in the brain should be expected to follow the negative Euclidean gradient of synaptic weights.
在计算神经科学中,递归神经网络被广泛用于模拟神经活动和学习。在许多研究中,递归神经网络的平衡点被用于模拟对静态或缓慢变化的刺激的神经反应,例如视觉皮层对静态视觉刺激的反应。这些应用提出了如何训练递归神经网络的权重以最小化在平衡点上评估的损失函数的问题。同时,训练平衡点是机器学习中深度学习平衡模型研究的一个核心主题。一种自然的方法是在权重的欧几里得空间上使用梯度下降。我们表明,由于损失表面中出现的奇点,这种方法可能导致学习性能不佳。我们使用递归网络模型的重参数化来推导出两种替代的学习规则,它们产生更稳健的学习动态。我们证明这些学习规则避免了奇点,并比标准的梯度下降更有效地学习。新的学习规则可以在递归权重空间上的非欧几里得度量下分别解释为最速下降和梯度下降。我们的结果质疑了一个常见的隐含假设,即大脑中的学习应该遵循突触权重的负欧几里得梯度。