Sanda Pavel, Skorheim Steven, Bazhenov Maxim
Department of Medicine, University of California, San Diego, La Jolla, California, United States of America.
Information and Systems Sciences Lab, HRL Laboratories, LLC, Malibu, California, United States of America.
PLoS Comput Biol. 2017 Sep 29;13(9):e1005705. doi: 10.1371/journal.pcbi.1005705. eCollection 2017 Sep.
Neural networks with a single plastic layer employing reward modulated spike time dependent plasticity (STDP) are capable of learning simple foraging tasks. Here we demonstrate advanced pattern discrimination and continuous learning in a network of spiking neurons with multiple plastic layers. The network utilized both reward modulated and non-reward modulated STDP and implemented multiple mechanisms for homeostatic regulation of synaptic efficacy, including heterosynaptic plasticity, gain control, output balancing, activity normalization of rewarded STDP and hard limits on synaptic strength. We found that addition of a hidden layer of neurons employing non-rewarded STDP created neurons that responded to the specific combinations of inputs and thus performed basic classification of the input patterns. When combined with a following layer of neurons implementing rewarded STDP, the network was able to learn, despite the absence of labeled training data, discrimination between rewarding patterns and the patterns designated as punishing. Synaptic noise allowed for trial-and-error learning that helped to identify the goal-oriented strategies which were effective in task solving. The study predicts a critical set of properties of the spiking neuronal network with STDP that was sufficient to solve a complex foraging task involving pattern classification and decision making.
具有单个采用奖励调制的基于脉冲时间的可塑性(STDP)的可塑性层的神经网络能够学习简单的觅食任务。在这里,我们展示了在具有多个可塑性层的脉冲神经元网络中的高级模式辨别和持续学习。该网络利用了奖励调制和非奖励调制的STDP,并实现了多种用于突触效能稳态调节的机制,包括异突触可塑性、增益控制、输出平衡、奖励STDP的活动归一化以及对突触强度的硬限制。我们发现,添加一层采用无奖励STDP的隐藏神经元会产生对特定输入组合做出反应的神经元,从而对输入模式进行基本分类。当与随后一层实施奖励STDP的神经元相结合时,尽管没有标记的训练数据,该网络仍能够学习区分奖励模式和被指定为惩罚的模式。突触噪声允许进行试错学习,这有助于识别在任务解决中有效的面向目标的策略。该研究预测了具有STDP的脉冲神经网络的一组关键特性,这些特性足以解决涉及模式分类和决策的复杂觅食任务。