基于调制的尖峰时间依赖突触可塑性的强化学习。
Reinforcement learning with modulated spike timing dependent synaptic plasticity.
作者信息
Farries Michael A, Fairhall Adrienne L
机构信息
Department of Biology, University of Texas at San Antonio, San Antonio, TX 78249, USA.
出版信息
J Neurophysiol. 2007 Dec;98(6):3648-65. doi: 10.1152/jn.00364.2007. Epub 2007 Oct 10.
Spike timing-dependent synaptic plasticity (STDP) has emerged as the preferred framework linking patterns of pre- and postsynaptic activity to changes in synaptic strength. Although synaptic plasticity is widely believed to be a major component of learning, it is unclear how STDP itself could serve as a mechanism for general purpose learning. On the other hand, algorithms for reinforcement learning work on a wide variety of problems, but lack an experimentally established neural implementation. Here, we combine these paradigms in a novel model in which a modified version of STDP achieves reinforcement learning. We build this model in stages, identifying a minimal set of conditions needed to make it work. Using a performance-modulated modification of STDP in a two-layer feedforward network, we can train output neurons to generate arbitrarily selected spike trains or population responses. Furthermore, a given network can learn distinct responses to several different input patterns. We also describe in detail how this model might be implemented biologically. Thus our model offers a novel and biologically plausible implementation of reinforcement learning that is capable of training a neural population to produce a very wide range of possible mappings between synaptic input and spiking output.
尖峰时间依赖性突触可塑性(STDP)已成为将突触前和突触后活动模式与突触强度变化联系起来的首选框架。尽管人们普遍认为突触可塑性是学习的主要组成部分,但尚不清楚STDP本身如何作为通用学习的机制。另一方面,强化学习算法可解决各种各样的问题,但缺乏实验确定的神经实现方式。在此,我们在一个新颖的模型中将这些范式结合起来,其中STDP的一个修改版本实现了强化学习。我们分阶段构建这个模型,确定使其工作所需的一组最小条件。在一个两层前馈网络中使用性能调制的STDP修改版本,我们可以训练输出神经元生成任意选择的尖峰序列或群体反应。此外,给定的网络可以学习对几种不同输入模式的不同反应。我们还详细描述了该模型在生物学上可能如何实现。因此,我们的模型提供了一种新颖且生物学上合理的强化学习实现方式,能够训练神经群体在突触输入和尖峰输出之间产生非常广泛的可能映射。