Department of Brain and Cognitive Sciences, McGovern Institute for Brain Research, Massachusetts Institute of Technology, Cambridge MA, USA.
Front Neural Circuits. 2012 Jun 27;6:38. doi: 10.3389/fncir.2012.00038. eCollection 2012.
In its simplest formulation, reinforcement learning is based on the idea that if an action taken in a particular context is followed by a favorable outcome, then, in the same context, the tendency to produce that action should be strengthened, or reinforced. While reinforcement learning forms the basis of many current theories of basal ganglia (BG) function, these models do not incorporate distinct computational roles for signals that convey context, and those that convey what action an animal takes. Recent experiments in the songbird suggest that vocal-related BG circuitry receives two functionally distinct excitatory inputs. One input is from a cortical region that carries context information about the current "time" in the motor sequence. The other is an efference copy of motor commands from a separate cortical brain region that generates vocal variability during learning. Based on these findings, I propose here a general model of vertebrate BG function that combines context information with a distinct motor efference copy signal. The signals are integrated by a learning rule in which efference copy inputs gate the potentiation of context inputs (but not efference copy inputs) onto medium spiny neurons in response to a rewarded action. The hypothesis is described in terms of a circuit that implements the learning of visually guided saccades. The model makes testable predictions about the anatomical and functional properties of hypothesized context and efference copy inputs to the striatum from both thalamic and cortical sources.
在其最简单的表述中,强化学习基于这样的理念:如果在特定环境下采取的行动伴随着有利的结果,那么在相同的环境下,产生该行动的倾向应该得到加强或增强。虽然强化学习是许多当前基底神经节(BG)功能理论的基础,但这些模型没有将传达环境信息的信号与传达动物采取什么行动的信号纳入其中。最近在鸣禽中的实验表明,与发声相关的 BG 回路接收两种功能上不同的兴奋性输入。一种输入来自皮质区域,携带关于运动序列当前“时间”的上下文信息。另一种是来自另一个皮质脑区的运动指令的传出副本,该副本在学习过程中产生发声变化。基于这些发现,我在这里提出了一个结合上下文信息和独特的运动传出副本信号的脊椎动物 BG 功能的一般模型。该信号通过学习规则进行整合,其中传出副本输入会在奖励性动作后将门控上下文输入(但不是传出副本输入)增强到中等棘突神经元上。该假设是根据一个电路来描述的,该电路实现了对视觉引导的扫视学习。该模型对来自丘脑和皮质源的假定上下文和传出副本输入到纹状体的解剖学和功能特性做出了可测试的预测。