Program in Neuroscience and MD-PhD Program, Harvard Medical School , Boston, Massachusetts.
Center for Brain Science and Department of Psychology, Harvard University , Cambridge, Massachusetts.
J Neurophysiol. 2019 May 1;121(5):1748-1760. doi: 10.1152/jn.00817.2018. Epub 2019 Mar 13.
The modulation of interval timing by dopamine (DA) has been well established over decades of research. The nature of this modulation, however, has remained controversial: Although the pharmacological evidence has largely suggested that time intervals are overestimated with higher DA levels, more recent optogenetic work has shown the opposite effect. In addition, a large body of work has asserted DA's role as a "reward prediction error" (RPE), or a teaching signal that allows the basal ganglia to learn to predict future rewards in reinforcement learning tasks. Whether these two seemingly disparate accounts of DA may be related has remained an open question. By taking a reinforcement learning-based approach to interval timing, we show here that the RPE interpretation of DA naturally extends to its role as a modulator of timekeeping and furthermore that this view reconciles the seemingly conflicting observations. We derive a biologically plausible, DA-dependent plasticity rule that can modulate the rate of timekeeping in either direction and whose effect depends on the timing of the DA signal itself. This bidirectional update rule can account for the results from pharmacology and optogenetics as well as the behavioral effects of reward rate on interval timing and the temporal selectivity of striatal neurons. Hence, by adopting a single RPE interpretation of DA, our results take a step toward unifying computational theories of reinforcement learning and interval timing. How does dopamine (DA) influence interval timing? A large body of pharmacological evidence has suggested that DA accelerates timekeeping mechanisms. However, recent optogenetic work has shown exactly the opposite effect. In this article, we relate DA's role in timekeeping to its most established role, as a critical component of reinforcement learning. This allows us to derive a neurobiologically plausible framework that reconciles a large body of DA's temporal effects, including pharmacological, behavioral, electrophysiological, and optogenetic.
几十年来的研究已经证实,多巴胺(DA)可以调节时间间隔。然而,这种调节的性质一直存在争议:尽管药理学证据表明,随着 DA 水平的升高,时间间隔会被高估,但最近的光遗传学研究表明了相反的效果。此外,大量研究断言 DA 作为“奖励预测误差”(RPE)的作用,或者说是一种教学信号,使基底神经节能够在强化学习任务中学习预测未来奖励。这些关于 DA 的两种看似不同的解释是否相关,一直是一个悬而未决的问题。通过采用基于强化学习的方法来研究时间间隔,我们在这里表明,DA 的 RPE 解释自然延伸到其作为时间保持的调节剂的作用,并且这种观点调和了看似相互矛盾的观察结果。我们推导出一种具有生物学合理性的、依赖于 DA 的可塑性规则,该规则可以在两个方向上调节计时的速度,其效果取决于 DA 信号本身的时间。这种双向更新规则可以解释药理学和光遗传学的结果,以及奖励率对时间间隔的行为影响以及纹状体神经元的时间选择性。因此,通过采用对 DA 的单一 RPE 解释,我们的结果朝着统一强化学习和时间间隔的计算理论迈出了一步。
多巴胺(DA)如何影响时间间隔?大量药理学证据表明,DA 会加速计时机制。然而,最近的光遗传学研究表明了完全相反的效果。在本文中,我们将 DA 在计时中的作用与其最确定的作用联系起来,即作为强化学习的关键组成部分。这使我们能够推导出一个神经生物学上合理的框架,调和大量 DA 的时间效应,包括药理学、行为学、电生理学和光遗传学。