Department of Neuroscience, Karolinska Institutet, 171 77 Stockholm, Sweden.
Division of Computational Science and Technology, School of Electrical Engineering and Computer Science, KTH Royal Institute of Technology, 114 28 Stockholm, Sweden.
Proc Natl Acad Sci U S A. 2023 Aug 8;120(32):e2221994120. doi: 10.1073/pnas.2221994120. Epub 2023 Aug 1.
It is well established that midbrain dopaminergic neurons support reinforcement learning (RL) in the basal ganglia by transmitting a reward prediction error (RPE) to the striatum. In particular, different computational models and experiments have shown that a striatum-wide RPE signal can support RL over a small discrete set of actions (e.g., no/no-go, choose left/right). However, there is accumulating evidence that the basal ganglia functions not as a selector between predefined actions but rather as a dynamical system with graded, continuous outputs. To reconcile this view with RL, there is a need to explain how dopamine could support learning of continuous outputs, rather than discrete action values. Inspired by the recent observations that besides RPE, the firing rates of midbrain dopaminergic neurons correlate with motor and cognitive variables, we propose a model in which dopamine signal in the striatum carries a vector-valued error feedback signal (a loss gradient) instead of a homogeneous scalar error (a loss). We implement a local, "three-factor" corticostriatal plasticity rule involving the presynaptic firing rate, a postsynaptic factor, and the unique dopamine concentration perceived by each striatal neuron. With this learning rule, we show that such a vector-valued feedback signal results in an increased capacity to learn a multidimensional series of real-valued outputs. Crucially, we demonstrate that this plasticity rule does not require precise nigrostriatal synapses but remains compatible with experimental observations of random placement of varicosities and diffuse volume transmission of dopamine.
中脑多巴胺能神经元通过将奖励预测误差 (RPE) 传递到纹状体来支持基底神经节中的强化学习 (RL),这一点已得到充分证实。特别是,不同的计算模型和实验表明,纹状体广泛的 RPE 信号可以支持 RL 在小离散的动作集(例如,无/无反应,选择左/右)上进行。然而,越来越多的证据表明,基底神经节的功能不是作为预定义动作之间的选择器,而是作为具有渐变、连续输出的动力系统。为了将这种观点与 RL 调和起来,需要解释多巴胺如何支持连续输出的学习,而不是离散的动作值。受最近观察到的中脑多巴胺能神经元的放电率与运动和认知变量相关的启示,我们提出了一个模型,其中纹状体中的多巴胺信号携带一个向量误差反馈信号(损失梯度),而不是同质的标量误差(损失)。我们实现了一种局部的、“三因素”皮质纹状体可塑性规则,该规则涉及到突触前放电率、突触后因子和每个纹状体神经元感知到的独特多巴胺浓度。通过这个学习规则,我们表明这种向量误差反馈信号会增加学习多维真实值输出的能力。至关重要的是,我们证明这种可塑性规则不需要精确的黑质纹状体突触,并且仍然与多巴胺的血管球随机放置和弥散体积传递的实验观察结果兼容。