Molecular Neurobiology Laboratory, The Salk Institute for Biological Studies, La Jolla, CA 92037, USA.
Molecular Neurobiology Laboratory, The Salk Institute for Biological Studies, La Jolla, CA 92037, USA; Center for Motor Control and Disease, Key Laboratory of Brain Functional Genomics, East China Normal University, Shanghai 200062, China; NYU-ECNU Institute of Brain and Cognitive Science, New York University Shanghai, Shanghai 200062, China.
Curr Biol. 2021 Dec 6;31(23):5350-5363.e5. doi: 10.1016/j.cub.2021.09.040. Epub 2021 Oct 11.
Dopamine has been suggested to encode cue-reward prediction errors during Pavlovian conditioning, signaling discrepancies between actual versus expected reward predicted by the cues. While this theory has been widely applied to reinforcement learning concerning instrumental actions, whether dopamine represents action-outcome prediction errors and how it controls sequential behavior remain largely unknown. The vast majority of previous studies examining dopamine responses primarily have used discrete reward-predictive stimuli, whether Pavlovian conditioned stimuli for which no action is required to earn reward or explicit discriminative stimuli that essentially instruct an animal how and when to respond for reward. Here, by training mice to perform optogenetic intracranial self-stimulation, we examined how self-initiated goal-directed behavior influences nigrostriatal dopamine transmission during single and sequential instrumental actions, in behavioral contexts with minimal overt changes in the animal's external environment. We found that dopamine release evoked by direct optogenetic stimulation was dramatically reduced when delivered as the consequence of the animal's own action, relative to non-contingent passive stimulation. This dopamine suppression generalized to food rewards was specific to the reinforced action, was temporally restricted to counteract the expected outcome, and exhibited sequence-selectivity consistent with hierarchical control of sequential behavior. These findings demonstrate that nigrostriatal dopamine signals sequence-specific prediction errors in action-outcome associations, with fundamental implications for reinforcement learning and instrumental behavior in health and disease.
多巴胺被认为在条件反射过程中编码线索-奖励预测误差,信号提示与实际奖励之间的差异,而实际奖励是由线索所预测的。虽然这一理论已广泛应用于涉及工具性动作的强化学习,但多巴胺是否代表动作-结果预测误差以及它如何控制序列行为在很大程度上仍不清楚。在检查多巴胺反应的绝大多数先前研究中,主要使用了离散的奖励预测刺激,无论是无需采取行动即可获得奖励的条件性刺激,还是本质上指示动物如何以及何时为奖励做出反应的显性辨别刺激。在这里,通过训练老鼠进行光遗传学颅内自我刺激,我们研究了自我发起的目标导向行为如何在单一和连续的工具性动作期间影响黑质纹状体多巴胺传递,在行为环境中,动物的外部环境几乎没有明显变化。我们发现,与非偶然的被动刺激相比,当动物自身的动作产生直接光遗传学刺激时,多巴胺的释放明显减少。这种对食物奖励的多巴胺抑制仅针对强化动作,在时间上受到限制,以抵消预期的结果,并表现出与序列行为的分层控制一致的序列选择性。这些发现表明,黑质纹状体多巴胺信号在动作-结果关联中具有序列特异性预测误差,这对健康和疾病中的强化学习和工具性行为具有重要意义。