Stocco Andrea
Institute for Learning and Brain Sciences, University of Washington Seattle, WA, USA.
Front Neurosci. 2012 Feb 6;6:18. doi: 10.3389/fnins.2012.00018. eCollection 2012.
The basal ganglia play a fundamental role in decision-making. Their contribution is typically modeled within a reinforcement learning framework, with the basal ganglia learning to select the options associated with highest value and their dopamine inputs conveying performance feedback. This basic framework, however, does not account for the role of cholinergic interneurons in the striatum, and does not easily explain certain dynamic aspects of decision-making and skill acquisition like the generation of exploratory actions. This paper describes basal ganglia acetylcholine-based entropy (BABE), a model of the acetylcholine system in the striatum that provides a unified explanation for these phenomena. According to this model, cholinergic interneurons in the striatum control the level of variability in behavior by modulating the number of possible responses that are considered by the basal ganglia, as well as the level of competition between them. This mechanism provides a natural way to account for the role of basal ganglia in generating behavioral variability during the acquisition of certain cognitive skills, as well as for modulating exploration and exploitation in decision-making. Compared to a typical reinforcement learning model, BABE showed a greater modulation of response variability in the face of changes in the reward contingences, allowing for faster learning (and re-learning) of option values. Finally, the paper discusses the possible applications of the model to other domains.
基底神经节在决策过程中发挥着重要作用。它们的作用通常在强化学习框架内进行建模,基底神经节学习选择与最高价值相关的选项,其多巴胺输入传达性能反馈。然而,这个基本框架没有考虑纹状体中胆碱能中间神经元的作用,也不容易解释决策和技能习得的某些动态方面,比如探索性动作的产生。本文描述了基于基底神经节乙酰胆碱的熵(BABE),这是一种纹状体中乙酰胆碱系统的模型,为这些现象提供了统一的解释。根据这个模型,纹状体中的胆碱能中间神经元通过调节基底神经节考虑的可能反应数量以及它们之间的竞争水平来控制行为的变异性。这种机制为解释基底神经节在某些认知技能习得过程中产生行为变异性的作用,以及在决策中调节探索和利用提供了一种自然的方式。与典型的强化学习模型相比,BABE在面对奖励偶然性变化时对反应变异性的调节更大,从而允许更快地学习(和重新学习)选项价值。最后,本文讨论了该模型在其他领域的可能应用。