Franklin Nicholas T, Frank Michael J
Department of Cognitive, Linguistic and Psychological Sciences, Brown Institute for Brain Science, Brown University, Providence, United States.
Elife. 2015 Dec 25;4:e12029. doi: 10.7554/eLife.12029.
Convergent evidence suggests that the basal ganglia support reinforcement learning by adjusting action values according to reward prediction errors. However, adaptive behavior in stochastic environments requires the consideration of uncertainty to dynamically adjust the learning rate. We consider how cholinergic tonically active interneurons (TANs) may endow the striatum with such a mechanism in computational models spanning three Marr's levels of analysis. In the neural model, TANs modulate the excitability of spiny neurons, their population response to reinforcement, and hence the effective learning rate. Long TAN pauses facilitated robustness to spurious outcomes by increasing divergence in synaptic weights between neurons coding for alternative action values, whereas short TAN pauses facilitated stochastic behavior but increased responsiveness to change-points in outcome contingencies. A feedback control system allowed TAN pauses to be dynamically modulated by uncertainty across the spiny neuron population, allowing the system to self-tune and optimize performance across stochastic environments.
越来越多的证据表明,基底神经节通过根据奖励预测误差调整动作值来支持强化学习。然而,在随机环境中的适应性行为需要考虑不确定性以动态调整学习率。我们探讨了胆碱能紧张性活动中间神经元(TANs)如何在跨越马尔三个分析层次的计算模型中赋予纹状体这样一种机制。在神经模型中,TANs调节棘状神经元的兴奋性、它们对强化的群体反应,从而调节有效学习率。TANs的长时间停顿通过增加编码替代动作值的神经元之间突触权重的差异,促进了对虚假结果的鲁棒性,而TANs的短时间停顿促进了随机行为,但增加了对结果偶然性变化点的反应性。一个反馈控制系统允许TANs的停顿根据棘状神经元群体中的不确定性进行动态调节,使系统能够在随机环境中自我调整并优化性能。