The Solomon H. Snyder Department of Neuroscience, Brain Science Institute, Kavli Neuroscience Discovery Institute, The Johns Hopkins University School of Medicine, Baltimore, MD 21205, USA.
The Solomon H. Snyder Department of Neuroscience, Brain Science Institute, Kavli Neuroscience Discovery Institute, The Johns Hopkins University School of Medicine, Baltimore, MD 21205, USA.
Neuron. 2019 Sep 4;103(5):922-933.e7. doi: 10.1016/j.neuron.2019.06.001. Epub 2019 Jul 4.
Decisions occur in dynamic environments. In the framework of reinforcement learning, the probability of performing an action is influenced by decision variables. Discrepancies between predicted and obtained rewards (reward prediction errors) update these variables, but they are otherwise stable between decisions. Although reward prediction errors have been mapped to midbrain dopamine neurons, it is unclear how the brain represents decision variables themselves. We trained mice on a dynamic foraging task in which they chose between alternatives that delivered reward with changing probabilities. Neurons in the medial prefrontal cortex, including projections to the dorsomedial striatum, maintained persistent firing rate changes over long timescales. These changes stably represented relative action values (to bias choices) and total action values (to bias response times) with slow decay. In contrast, decision variables were weakly represented in the anterolateral motor cortex, a region necessary for generating choices. Thus, we define a stable neural mechanism to drive flexible behavior.
决策是在动态环境中做出的。在强化学习的框架下,执行动作的概率受到决策变量的影响。预测奖励和实际奖励之间的差异(奖励预测误差)会更新这些变量,但在决策之间它们是稳定的。尽管奖励预测误差已经映射到中脑多巴胺神经元,但尚不清楚大脑如何表示决策变量本身。我们在一个动态觅食任务中对老鼠进行训练,它们在具有变化概率的选择之间做出选择。内侧前额叶皮层中的神经元,包括对背内侧纹状体的投射,在长时间内保持持久的放电率变化。这些变化稳定地表示相对动作值(偏向选择)和总动作值(偏向反应时间),衰减缓慢。相比之下,决策变量在外侧运动皮层中弱表示,该区域对于产生选择是必需的。因此,我们定义了一种稳定的神经机制来驱动灵活的行为。