Pennartz C M
California Institute of Technology, Pasadena, USA.
Neuroscience. 1997 Nov;81(2):303-19. doi: 10.1016/s0306-4522(97)00118-8.
A central problem in learning theory is how the vertebrate brain processes reinforcing stimuli in order to master complex sensorimotor tasks. This problem belongs to the domain of supervised learning, in which errors in the response of a neural network serve as the basis for modification of synaptic connectivity in the network and thereby train it on a computational task. The model presented here shows how a reinforcing feedback can modify synapses in a neuronal network according to the principles of Hebbian learning. The reinforcing feedback steers synapses towards long-term potentiation or depression by critically influencing the rise in postsynaptic calcium, in accordance with findings on synaptic plasticity in mammalian brain. An important feature of the model is the dependence of modification thresholds on the previous history of reinforcing feedback processed by the network. The learning algorithm trained networks successfully on a task in which a population vector in the motor output was required to match a sensory stimulus vector presented shortly before. In another task, networks were trained to compute coordinate transformations by combining different visual inputs. The model continued to behave well when simplified units were replaced by single-compartment neurons equipped with several conductances and operating in continuous time. This novel form of reinforcement learning incorporates essential properties of Hebbian synaptic plasticity and thereby shows that supervised learning can be accomplished by a learning rule similar to those used in physiologically plausible models of unsupervised learning. The model can be crudely correlated to the anatomy and electrophysiology of the amygdala, prefrontal and cingulate cortex and has predictive implications for further experiments on synaptic plasticity and learning processes mediated by these areas.
学习理论中的一个核心问题是脊椎动物的大脑如何处理强化刺激,以便掌握复杂的感觉运动任务。这个问题属于监督学习领域,在该领域中,神经网络响应中的错误作为网络中突触连接性修改的基础,从而使其在计算任务上得到训练。这里提出的模型展示了强化反馈如何根据赫布学习原理修改神经网络中的突触。根据哺乳动物大脑中突触可塑性的研究结果,强化反馈通过关键地影响突触后钙的升高,将突触导向长时程增强或抑制。该模型的一个重要特征是修改阈值取决于网络处理的强化反馈的先前历史。学习算法在一项任务上成功地训练了网络,在该任务中,要求运动输出中的群体向量匹配不久前呈现的感觉刺激向量。在另一项任务中,网络被训练通过组合不同的视觉输入来计算坐标变换。当用配备多种电导并在连续时间内运行的单室神经元取代简化单元时,该模型仍表现良好。这种新颖的强化学习形式纳入了赫布突触可塑性的基本特性,从而表明监督学习可以通过类似于无监督学习的生理学上合理的模型中使用的学习规则来完成。该模型可以大致与杏仁核、前额叶和扣带回皮质的解剖结构和电生理相关联,并对由这些区域介导的突触可塑性和学习过程的进一步实验具有预测意义。