Department of Neurobiology and Behavior, Cornell University, Ithaca, NY 14853, United States.
Department of Neurobiology and Behavior, Cornell University, Ithaca, NY 14853, United States.
Curr Opin Neurobiol. 2020 Dec;65:1-9. doi: 10.1016/j.conb.2020.08.005. Epub 2020 Sep 6.
It feels rewarding to ace your opponent on match point. Here, we propose common mechanisms underlie reward and performance learning. First, when a singing bird unexpectedly hits the right note, its dopamine (DA) neurons are activated as when a thirsty monkey receives an unexpected juice reward. Second, these DA signals reinforce vocal variations much as they reinforce stimulus-response associations. Third, limbic inputs to DA neurons signal the predicted quality of song syllables much like they signal the predicted reward value of a place or a stimulus during foraging. Finally, songbirds may solve difficult problems in reinforcement learning - such as credit assignment and catastrophic forgetting - with node perturbation and consolidation of reinforced vocal patterns in motor cortical circuits. Consolidation occurs downstream of a canonical 'actor-critic' circuit motif that learns to maximize performance quality in essentially the same way it learns to maximize reward: by computing and learning from prediction errors.
在决胜点击败对手感觉很有成就感。在这里,我们提出了奖励和表现学习的共同机制。首先,当一只唱歌的鸟出人意料地唱出正确的音符时,它的多巴胺(DA)神经元就会像口渴的猴子得到意想不到的果汁奖励时一样被激活。其次,这些 DA 信号像强化刺激-反应关联一样强化声音变化。第三,DA 神经元的边缘输入信号预测歌曲音节的质量,就像它们预测觅食过程中一个地方或刺激的预测奖励价值一样。最后,鸣禽可能会通过节点扰动和强化运动皮质回路中的发声模式来解决强化学习中的难题,例如信用分配和灾难性遗忘。巩固发生在经典的“行动者-批评者”电路模式的下游,该模式通过计算和从预测误差中学习,以基本上与学习最大化奖励相同的方式来学习最大化性能质量。