Lindsey Jack W, Markowitz Jeffrey, Gillis Winthrop F, Datta Sandeep R, Litwin-Kumar Ashok
Kavli Institute for Brain Science, Columbia University, New York, United States.
Wallace H. Coulter Department of Biomedical Engineering, Georgia Institute of Technology and Emory University, Atlanta, United States.
Elife. 2025 May 8;13:RP101747. doi: 10.7554/eLife.101747.
Spiny projection neurons (SPNs) in dorsal striatum are often proposed as a locus of reinforcement learning in the basal ganglia. Here, we identify and resolve a fundamental inconsistency between striatal reinforcement learning models and known SPN synaptic plasticity rules. Direct-pathway (dSPN) and indirect-pathway (iSPN) neurons, which promote and suppress actions, respectively, exhibit synaptic plasticity that reinforces activity associated with elevated or suppressed dopamine release. We show that iSPN plasticity prevents successful learning, as it reinforces activity patterns associated with negative outcomes. However, this pathological behavior is reversed if functionally opponent dSPNs and iSPNs, which promote and suppress the current behavior, are simultaneously activated by efferent input following action selection. This prediction is supported by striatal recordings and contrasts with prior models of SPN representations. In our model, learning and action selection signals can be multiplexed without interference, enabling learning algorithms beyond those of standard temporal difference models.
背侧纹状体中的棘状投射神经元(SPN)常被认为是基底神经节中强化学习的位点。在此,我们识别并解决了纹状体强化学习模型与已知的SPN突触可塑性规则之间的一个基本矛盾。分别促进和抑制动作的直接通路(dSPN)和间接通路(iSPN)神经元,表现出强化与多巴胺释放升高或抑制相关活动的突触可塑性。我们表明,iSPN可塑性会阻止成功学习,因为它强化了与负面结果相关的活动模式。然而,如果在动作选择后,促进和抑制当前行为的功能上相对的dSPN和iSPN被传出输入同时激活,这种病理行为就会被逆转。这一预测得到了纹状体记录的支持,并且与先前的SPN表征模型形成对比。在我们的模型中,学习信号和动作选择信号可以无干扰地复用,从而实现超越标准时间差分模型的学习算法。