Kasdin Jonathan, Duffy Alison, Nadler Nathan, Raha Arnav, Fairhall Adrienne L, Stachenfeld Kimberly L, Gadagkar Vikram
Department of Neuroscience, Zuckerman Mind Brain Behavior Institute, Columbia University, New York, NY, USA.
Department of Neurobiology and Biophysics and Computational Neuroscience Center, University of Washington, Seattle, WA, USA.
Nature. 2025 May;641(8063):699-706. doi: 10.1038/s41586-025-08729-1. Epub 2025 Mar 12.
Many natural motor skills, such as speaking or locomotion, are acquired through a process of trial-and-error learning over the course of development. It has long been hypothesized, motivated by observations in artificial learning experiments, that dopamine has a crucial role in this process. Dopamine in the basal ganglia is thought to guide reward-based trial-and-error learning by encoding reward prediction errors, decreasing after worse-than-predicted reward outcomes and increasing after better-than-predicted ones. Our previous work in adult zebra finches-in which we changed the perceived song quality with distorted auditory feedback-showed that dopamine in Area X, the singing-related basal ganglia, encodes performance prediction error: dopamine is suppressed after worse-than-predicted (distorted syllables) and activated after better-than-predicted (undistorted syllables) performance. However, it remains unknown whether the learning of natural behaviours, such as developmental vocal learning, occurs through dopamine-based reinforcement. Here we tracked song learning trajectories in juvenile zebra finches and used fibre photometry to monitor concurrent dopamine activity in Area X. We found that dopamine was activated after syllable renditions that were closer to the eventual adult version of the song, compared with recent renditions, and suppressed after renditions that were further away. Furthermore, the relationship between dopamine and song fluctuations revealed that dopamine predicted the future evolution of song, suggesting that dopamine drives behaviour. Finally, dopamine activity was explained by the contrast between the quality of the current rendition and the recent history of renditions-consistent with dopamine's hypothesized role in encoding prediction errors in an actor-critic reinforcement-learning model. Reinforcement-learning algorithms have emerged as a powerful class of model to explain learning in reward-based laboratory tasks, as well as for driving autonomous learning in artificial intelligence. Our results suggest that complex natural behaviours in biological systems can also be acquired through dopamine-mediated reinforcement learning.
许多自然运动技能,如说话或移动,是在发育过程中通过试错学习过程获得的。长期以来,受人工学习实验观察结果的启发,人们一直假设多巴胺在这一过程中起着关键作用。基底神经节中的多巴胺被认为通过编码奖励预测误差来指导基于奖励的试错学习,在奖励结果比预期差时减少,在奖励结果比预期好时增加。我们之前在成年斑胸草雀身上的研究——我们通过扭曲的听觉反馈改变了感知到的歌声质量——表明,与唱歌相关的基底神经节X区域中的多巴胺编码表现预测误差:在表现比预期差(音节扭曲)后多巴胺被抑制,在表现比预期好(音节未扭曲)后被激活。然而,自然行为的学习,如发育性发声学习,是否通过基于多巴胺的强化来发生仍然未知。在这里,我们追踪了幼年斑胸草雀的歌声学习轨迹,并使用光纤光度法监测X区域中同时发生的多巴胺活动。我们发现,与最近的演唱相比,当音节演唱更接近歌曲最终的成年版本时,多巴胺会被激活,而在距离更远的演唱后会被抑制。此外,多巴胺与歌声波动之间的关系表明,多巴胺预测了歌声的未来演变,这表明多巴胺驱动行为。最后,多巴胺活动可以通过当前演唱质量与近期演唱历史之间的对比来解释——这与多巴胺在演员-评论家强化学习模型中编码预测误差的假设作用一致。强化学习算法已成为一类强大的模型,用于解释基于奖励的实验室任务中的学习,以及驱动人工智能中的自主学习。我们的结果表明,生物系统中的复杂自然行为也可以通过多巴胺介导的强化学习来获得。