Mikaitis Mantas, Pineda García Garibaldi, Knight James C, Furber Steve B
Advanced Processor Technologies, Faculty of Science and Engineering, School of Computer Science, University of Manchester, Manchester, United Kingdom.
Centre for Computational Neuroscience and Robotics, School of Engineering and Informatics, University of Sussex, Brighton, United Kingdom.
Front Neurosci. 2018 Feb 27;12:105. doi: 10.3389/fnins.2018.00105. eCollection 2018.
SpiNNaker is a digital neuromorphic architecture, designed specifically for the low power simulation of large-scale spiking neural networks at speeds close to biological real-time. Unlike other neuromorphic systems, SpiNNaker allows users to develop their own neuron and synapse models as well as specify arbitrary connectivity. As a result SpiNNaker has proved to be a powerful tool for studying different neuron models as well as synaptic plasticity-believed to be one of the main mechanisms behind learning and memory in the brain. A number of Spike-Timing-Dependent-Plasticity(STDP) rules have already been implemented on SpiNNaker and have been shown to be capable of solving various learning tasks in real-time. However, while STDP is an important biological theory of learning, it is a form of Hebbian or unsupervised learning and therefore does not explain behaviors that depend on feedback from the environment. Instead, learning rules based on neuromodulated STDP (three-factor learning rules) have been shown to be capable of solving reinforcement learning tasks in a biologically plausible manner. In this paper we demonstrate for the first time how a model of three-factor STDP, with the third-factor representing spikes from dopaminergic neurons, can be implemented on the SpiNNaker neuromorphic system. Using this learning rule we first show how reward and punishment signals can be delivered to a single synapse before going on to demonstrate it in a larger network which solves the credit assignment problem in a Pavlovian conditioning experiment. Because of its extra complexity, we find that our three-factor learning rule requires approximately 2× as much processing time as the existing SpiNNaker STDP learning rules. However, we show that it is still possible to run our Pavlovian conditioning model with up to 1 × 10 neurons in real-time, opening up new research opportunities for modeling behavioral learning on SpiNNaker.
SpiNNaker是一种数字神经形态架构,专门设计用于以接近生物实时的速度对大规模脉冲神经网络进行低功耗模拟。与其他神经形态系统不同,SpiNNaker允许用户开发自己的神经元和突触模型,并指定任意连接。因此,SpiNNaker已被证明是研究不同神经元模型以及突触可塑性的强大工具,而突触可塑性被认为是大脑学习和记忆背后的主要机制之一。许多基于脉冲时间依赖可塑性(STDP)的规则已经在SpiNNaker上实现,并已被证明能够实时解决各种学习任务。然而,虽然STDP是一种重要的生物学学习理论,但它是一种赫布式或无监督学习形式,因此无法解释依赖于环境反馈的行为。相反,基于神经调节STDP的学习规则(三因素学习规则)已被证明能够以生物学上合理的方式解决强化学习任务。在本文中,我们首次展示了如何在SpiNNaker神经形态系统上实现三因素STDP模型,其中第三因素代表来自多巴胺能神经元的脉冲。使用这个学习规则,我们首先展示了奖励和惩罚信号如何传递到单个突触,然后在一个更大的网络中进行演示,该网络在巴甫洛夫条件实验中解决了信用分配问题。由于其额外的复杂性,我们发现我们的三因素学习规则所需的处理时间大约是现有的SpiNNaker STDP学习规则的2倍。然而,我们表明,仍然可以实时运行我们的巴甫洛夫条件模型,其中包含多达1×10个神经元,为在SpiNNaker上进行行为学习建模开辟了新的研究机会。