Gygax Julia, Zenke Friedemann
Friedrich Miescher Institute for Biomedical Research and Faculty of Science, University of Basel, Basel 4056, Switzerland
Neural Comput. 2025 Apr 17;37(5):886-925. doi: 10.1162/neco_a_01752.
Training spiking neural networks to approximate universal functions is essential for studying information processing in the brain and for neuromorphic computing. Yet the binary nature of spikes poses a challenge for direct gradient-based training. Surrogate gradients have been empirically successful in circumventing this problem, but their theoretical foundation remains elusive. Here, we investigate the relation of surrogate gradients to two theoretically well-founded approaches. On the one hand, we consider smoothed probabilistic models, which, due to the lack of support for automatic differentiation, are impractical for training multilayer spiking neural networks but provide derivatives equivalent to surrogate gradients for single neurons. On the other hand, we investigate stochastic automatic differentiation, which is compatible with discrete randomness but has not yet been used to train spiking neural networks. We find that the latter gives surrogate gradients a theoretical basis in stochastic spiking neural networks, where the surrogate derivative matches the derivative of the neuronal escape noise function. This finding supports the effectiveness of surrogate gradients in practice and suggests their suitability for stochastic spiking neural networks. However, surrogate gradients are generally not gradients of a surrogate loss despite their relation to stochastic automatic differentiation. Nevertheless, we empirically confirm the effectiveness of surrogate gradients in stochastic multilayer spiking neural networks and discuss their relation to deterministic networks as a special case. Our work gives theoretical support to surrogate gradients and the choice of a suitable surrogate derivative in stochastic spiking neural networks.
训练脉冲神经网络以逼近通用函数对于研究大脑中的信息处理和神经形态计算至关重要。然而,脉冲的二元性质给基于直接梯度的训练带来了挑战。替代梯度在规避这个问题方面经验证是成功的,但其理论基础仍然难以捉摸。在这里,我们研究替代梯度与两种理论基础扎实的方法之间的关系。一方面,我们考虑平滑概率模型,由于缺乏对自动微分的支持,它对于训练多层脉冲神经网络不切实际,但为单个神经元提供与替代梯度等效的导数。另一方面,我们研究随机自动微分,它与离散随机性兼容,但尚未用于训练脉冲神经网络。我们发现,后者为替代梯度在随机脉冲神经网络中提供了理论基础,其中替代导数与神经元逃逸噪声函数的导数相匹配。这一发现支持了替代梯度在实践中的有效性,并表明它们适用于随机脉冲神经网络。然而,尽管替代梯度与随机自动微分有关,但它们通常不是替代损失的梯度。尽管如此,我们通过实验证实了替代梯度在随机多层脉冲神经网络中的有效性,并将它们与确定性网络的关系作为一种特殊情况进行了讨论。我们的工作为替代梯度以及在随机脉冲神经网络中选择合适的替代导数提供了理论支持。