大脑中的多时间尺度强化学习。

Multi-timescale reinforcement learning in the brain.

作者信息

Masset Paul, Tano Pablo, Kim HyungGoo R, Malik Athar N, Pouget Alexandre, Uchida Naoshige

机构信息

Department of Molecular and Cellular Biology, Harvard University, Cambridge, MA, USA.

Center for Brain Science, Harvard University, Cambridge, MA, USA.

出版信息

Nature. 2025 Jun 4. doi: 10.1038/s41586-025-08929-9.

DOI:10.1038/s41586-025-08929-9

PMID:40468072

Abstract

To thrive in complex environments, animals and artificial agents must learn to act adaptively to maximize fitness and rewards. Such adaptive behaviour can be learned through reinforcement learning, a class of algorithms that has been successful at training artificial agents and at characterizing the firing of dopaminergic neurons in the midbrain. In classical reinforcement learning, agents discount future rewards exponentially according to a single timescale, known as the discount factor. Here we explore the presence of multiple timescales in biological reinforcement learning. We first show that reinforcement agents learning at a multitude of timescales possess distinct computational benefits. Next, we report that dopaminergic neurons in mice performing two behavioural tasks encode reward prediction error with a diversity of discount time constants. Our model explains the heterogeneity of temporal discounting in both cue-evoked transient responses and slower timescale fluctuations known as dopamine ramps. Crucially, the measured discount factor of individual neurons is correlated across the two tasks, suggesting that it is a cell-specific property. Together, our results provide a new paradigm for understanding functional heterogeneity in dopaminergic neurons and a mechanistic basis for the empirical observation that humans and animals use non-exponential discounts in many situations, and open new avenues for the design of more-efficient reinforcement learning algorithms.

摘要

为了在复杂环境中茁壮成长，动物和智能体必须学会适应性地行动，以最大化适应性和奖励。这种适应性行为可以通过强化学习来学习，强化学习是一类在训练智能体以及刻画中脑多巴胺能神经元放电方面取得成功的算法。在经典强化学习中，智能体根据一个称为折扣因子的单一时间尺度对未来奖励进行指数折扣。在这里，我们探索生物强化学习中多个时间尺度的存在。我们首先表明，在多个时间尺度上学习的强化智能体具有不同的计算优势。接下来，我们报告在执行两项行为任务的小鼠中，多巴胺能神经元用多种折扣时间常数对奖励预测误差进行编码。我们的模型解释了线索诱发的瞬态反应和被称为多巴胺斜坡的较慢时间尺度波动中时间折扣的异质性。至关重要的是，单个神经元的测量折扣因子在两项任务之间是相关的，这表明它是一种细胞特异性属性。总之，我们的结果为理解多巴胺能神经元的功能异质性提供了一个新范式，为人类和动物在许多情况下使用非指数折扣这一实证观察提供了一个机制基础，并为设计更高效的强化学习算法开辟了新途径。

相似文献

Multi-timescale reinforcement learning in the brain.

Nature. 2025 Jun 4. doi: 10.1038/s41586-025-08929-9.

Multi-timescale reinforcement learning in the brain.

bioRxiv. 2023 Nov 14:2023.11.12.566754. doi: 10.1101/2023.11.12.566754.

A multidimensional distributional map of future reward in dopamine neurons.

Nature. 2025 Jun;642(8068):691-699. doi: 10.1038/s41586-025-09089-6. Epub 2025 Jun 4.

Dual neuromodulatory dynamics underlie birdsong learning.

Nature. 2025 May;641(8063):690-698. doi: 10.1038/s41586-025-08694-9. Epub 2025 Mar 12.

"We're all in it together": uniting a diverse range of professionals and people with lived experience within the development of a complex, theory-based paediatric speech and language therapy intervention.

Res Involv Engagem. 2025 Jun 19;11(1):67. doi: 10.1186/s40900-025-00738-8.

"Just Ask What Support We Need": Autistic Adults' Feedback on Social Skills Training.

Autism Adulthood. 2025 May 28;7(3):283-292. doi: 10.1089/aut.2023.0136. eCollection 2025 Jun.

Assessing the comparative effects of interventions in COPD: a tutorial on network meta-analysis for clinicians.

Respir Res. 2024 Dec 21;25(1):438. doi: 10.1186/s12931-024-03056-x.

Paraventricular Nucleus of the Thalamus Neurons That Project to the Nucleus Accumbens Show Enhanced c-Fos Expression During Early-Stage Cue-Reward Associative Learning in Male Rats.

Eur J Neurosci. 2025 Jun;61(12):e70168. doi: 10.1111/ejn.70168.

Layered reward signalling through octopamine and dopamine in Drosophila.

Nature. 2012 Dec 20;492(7429):433-7. doi: 10.1038/nature11614. Epub 2012 Oct 28.

An auditory cortical-striatal circuit supports sound-triggered timing to predict future events.

PLoS Biol. 2025 Jun 2;23(6):e3003209. doi: 10.1371/journal.pbio.3003209. eCollection 2025 Jun.

引用本文的文献

Mesolimbic dopamine ramps reflect environmental timescales.

Elife. 2025 Aug 29;13:RP98666. doi: 10.7554/eLife.98666.

Striatal Gradient in Value-Decay Explains Regional Differences in Dopamine Patterns and Reinforcement Learning Computations.

J Neurosci. 2025 Jul 18. doi: 10.1523/JNEUROSCI.0170-25.2025.

Prospective contingency explains behavior and dopamine signals during associative learning.

Nat Neurosci. 2025 Mar 18. doi: 10.1038/s41593-025-01915-4.

Mesolimbic dopamine ramps reflect environmental timescales.

bioRxiv. 2024 Apr 23:2024.03.27.587103. doi: 10.1101/2024.03.27.587103.

Reward timescale controls the rate of behavioural and dopaminergic learning.

bioRxiv. 2024 Sep 6:2023.03.31.535173. doi: 10.1101/2023.03.31.535173.

Learning temporal relationships between symbols with Laplace Neural Manifolds.

ArXiv. 2024 Sep 22:arXiv:2302.10163v4.

本文引用的文献

Reward Bases: A simple mechanism for adaptive acquisition of multiple reward types.

PLoS Comput Biol. 2024 Nov 19;20(11):e1012580. doi: 10.1371/journal.pcbi.1012580. eCollection 2024 Nov.

Explaining dopamine through prediction errors and beyond.

Nat Neurosci. 2024 Sep;27(9):1645-1655. doi: 10.1038/s41593-024-01705-4. Epub 2024 Jul 25.

A feature-specific prediction error model explains dopaminergic heterogeneity.

Nat Neurosci. 2024 Aug;27(8):1574-1586. doi: 10.1038/s41593-024-01689-1. Epub 2024 Jul 3.

Dopamine transients follow a striatal gradient of reward time horizons.

Nat Neurosci. 2024 Apr;27(4):737-746. doi: 10.1038/s41593-023-01566-3. Epub 2024 Feb 6.

The neural bases for timing of durations.

Nat Rev Neurosci. 2022 Nov;23(11):646-665. doi: 10.1038/s41583-022-00623-3. Epub 2022 Sep 12.

Action suppression reveals opponent parallel control via striatal circuits.

Nature. 2022 Jul;607(7919):521-526. doi: 10.1038/s41586-022-04894-9. Epub 2022 Jul 6.

Outracing champion Gran Turismo drivers with deep reinforcement learning.

Nature. 2022 Feb;602(7896):223-228. doi: 10.1038/s41586-021-04357-7. Epub 2022 Feb 9.

The role of state uncertainty in the dynamics of dopamine.

Curr Biol. 2022 Mar 14;32(5):1077-1087.e9. doi: 10.1016/j.cub.2022.01.025. Epub 2022 Feb 2.

Predictive Representations in Hippocampal and Prefrontal Hierarchies.

J Neurosci. 2022 Jan 12;42(2):299-312. doi: 10.1523/JNEUROSCI.1327-21.2021. Epub 2021 Nov 19.

A Unified Framework for Dopamine Signals across Timescales.

Cell. 2020 Dec 10;183(6):1600-1616.e25. doi: 10.1016/j.cell.2020.11.013. Epub 2020 Nov 27.

文献AI研究员

20分钟写一篇综述，助力文献阅读效率提升50倍。

立即体验

用中文搜PubMed

大模型驱动的PubMed中文搜索引擎

马上搜索

文档翻译

学术文献翻译模型，支持多种主流文档格式。

立即体验

大脑中的多时间尺度强化学习。

Multi-timescale reinforcement learning in the brain.

作者信息

机构信息

出版信息

相似文献

引用本文的文献

本文引用的文献

文献AI研究员

用中文搜PubMed

文档翻译

Suppr 超能文献

相似文献

引用本文的文献

本文引用的文献