多巴胺神经元中未来奖励的多维分布图。

A multidimensional distributional map of future reward in dopamine neurons.

作者信息

Sousa Margarida, Bujalski Pawel, Cruz Bruno F, Louie Kenway, McNamee Daniel C, Paton Joseph J

机构信息

Champalimaud Centre for the Unknown, Lisbon, Portugal.

Allen Institute for Neural Dynamics, Seattle, WA, USA.

出版信息

Nature. 2025 Jun;642(8068):691-699. doi: 10.1038/s41586-025-09089-6. Epub 2025 Jun 4.

DOI:10.1038/s41586-025-09089-6

PMID:40468078

Abstract

Midbrain dopamine neurons (DANs) signal reward-prediction errors that teach recipient circuits about expected rewards. However, DANs are thought to provide a substrate for temporal difference (TD) reinforcement learning (RL), an algorithm that learns the mean of temporally discounted expected future rewards, discarding useful information about experienced distributions of reward amounts and delays. Here we present time-magnitude RL (TMRL), a multidimensional variant of distributional RL that learns the joint distribution of future rewards over time and magnitude. We also uncover signatures of TMRL-like computations in the activity of optogenetically identified DANs in mice during behaviour. Specifically, we show that there is significant diversity in both temporal discounting and tuning for the reward magnitude across DANs. These features allow the computation of a two-dimensional, probabilistic map of future rewards from just 450 ms of the DAN population response to a reward-predictive cue. Furthermore, reward-time predictions derived from this code correlate with anticipatory behaviour, suggesting that similar information is used to guide decisions about when to act. Finally, by simulating behaviour in a foraging environment, we highlight the benefits of a joint probability distribution of reward over time and magnitude in the face of dynamic reward landscapes and internal states. These findings show that rich probabilistic reward information is learnt and communicated to DANs, and suggest a simple, local-in-time extension of TD algorithms that explains how such information might be acquired and computed.

摘要

中脑多巴胺能神经元（DANs）发出奖励预测误差信号，向接收回路传授预期奖励的相关信息。然而，DANs被认为是为时间差分（TD）强化学习（RL）提供了一种基础，TD强化学习是一种学习时间折扣预期未来奖励均值的算法，它丢弃了关于奖励数量和延迟的经验分布的有用信息。在此，我们提出了时间-量级强化学习（TMRL），这是一种分布强化学习的多维变体，它学习未来奖励随时间和量级的联合分布。我们还在小鼠行为期间光遗传学识别的DANs的活动中发现了类似TMRL计算的特征。具体而言，我们表明，DANs在时间折扣和奖励量级调谐方面都存在显著差异。这些特征使得仅从DAN群体对奖励预测线索450毫秒的反应中就能计算出未来奖励的二维概率图。此外，从该编码得出的奖励时间预测与预期行为相关，这表明类似的信息被用于指导关于何时行动的决策。最后，通过模拟觅食环境中的行为，我们突出了在面对动态奖励格局和内部状态时，奖励随时间和量级的联合概率分布的益处。这些发现表明，丰富的概率性奖励信息被学习并传递给DANs，并提出了一种TD算法的简单、即时局部扩展，解释了此类信息可能是如何获取和计算的。

相似文献

A multidimensional distributional map of future reward in dopamine neurons.

Nature. 2025 Jun;642(8068):691-699. doi: 10.1038/s41586-025-09089-6. Epub 2025 Jun 4.

Multi-timescale reinforcement learning in the brain.

Nature. 2025 Jun 4. doi: 10.1038/s41586-025-08929-9.

Adapting Safety Plans for Autistic Adults with Involvement from the Autism Community.

Autism Adulthood. 2025 May 28;7(3):293-302. doi: 10.1089/aut.2023.0124. eCollection 2025 Jun.

An auditory cortical-striatal circuit supports sound-triggered timing to predict future events.

PLoS Biol. 2025 Jun 2;23(6):e3003209. doi: 10.1371/journal.pbio.3003209. eCollection 2025 Jun.

Stigma Management Strategies of Autistic Social Media Users.

Autism Adulthood. 2025 May 28;7(3):273-282. doi: 10.1089/aut.2023.0095. eCollection 2025 Jun.

Assessing the comparative effects of interventions in COPD: a tutorial on network meta-analysis for clinicians.

Respir Res. 2024 Dec 21;25(1):438. doi: 10.1186/s12931-024-03056-x.

A Pilot Study of Political Experiences and Barriers to Voting Among Autistic Adults Participating in Online Survey Research in the United States.

Autism Adulthood. 2025 May 28;7(3):261-272. doi: 10.1089/aut.2023.0119. eCollection 2025 Jun.

Dual neuromodulatory dynamics underlie birdsong learning.

Nature. 2025 May;641(8063):690-698. doi: 10.1038/s41586-025-08694-9. Epub 2025 Mar 12.

"Just Ask What Support We Need": Autistic Adults' Feedback on Social Skills Training.

Autism Adulthood. 2025 May 28;7(3):283-292. doi: 10.1089/aut.2023.0136. eCollection 2025 Jun.

Community views on mass drug administration for soil-transmitted helminths: a qualitative evidence synthesis.

Cochrane Database Syst Rev. 2025 Jun 20;6:CD015794. doi: 10.1002/14651858.CD015794.pub2.

引用本文的文献

Mesolimbic dopamine ramps reflect environmental timescales.

Elife. 2025 Aug 29;13:RP98666. doi: 10.7554/eLife.98666.

Mesolimbic dopamine ramps reflect environmental timescales.

bioRxiv. 2024 Apr 23:2024.03.27.587103. doi: 10.1101/2024.03.27.587103.

Learning temporal relationships between symbols with Laplace Neural Manifolds.

ArXiv. 2024 Sep 22:arXiv:2302.10163v4.

本文引用的文献

A feature-specific prediction error model explains dopaminergic heterogeneity.

Nat Neurosci. 2024 Aug;27(8):1574-1586. doi: 10.1038/s41593-024-01689-1. Epub 2024 Jul 3.

Reward prediction error neurons implement an efficient code for reward.

Nat Neurosci. 2024 Jul;27(7):1333-1339. doi: 10.1038/s41593-024-01671-x. Epub 2024 Jun 19.

Distributional coding of associative learning in discrete populations of midbrain dopamine neurons.

Cell Rep. 2024 Apr 23;43(4):114080. doi: 10.1016/j.celrep.2024.114080. Epub 2024 Apr 4.

Distributional reinforcement learning in prefrontal cortex.

Nat Neurosci. 2024 Mar;27(3):403-408. doi: 10.1038/s41593-023-01535-w. Epub 2024 Jan 10.

Dopaminergic prediction errors in the ventral tegmental area reflect a multithreaded predictive model.

Nat Neurosci. 2023 May;26(5):830-839. doi: 10.1038/s41593-023-01310-x. Epub 2023 Apr 20.

Mesolimbic dopamine release conveys causal associations.

Science. 2022 Dec 23;378(6626):eabq6740. doi: 10.1126/science.abq6740.

Asymmetric and adaptive reward coding via normalized reinforcement learning.

PLoS Comput Biol. 2022 Jul 21;18(7):e1010350. doi: 10.1371/journal.pcbi.1010350. eCollection 2022 Jul.

Action suppression reveals opponent parallel control via striatal circuits.

Nature. 2022 Jul;607(7919):521-526. doi: 10.1038/s41586-022-04894-9. Epub 2022 Jul 6.

A distributional code for value in dopamine-based reinforcement learning.

Nature. 2020 Jan;577(7792):671-675. doi: 10.1038/s41586-019-1924-6. Epub 2020 Jan 15.

Temporally restricted dopaminergic control of reward-conditioned movements.

Nat Neurosci. 2020 Feb;23(2):209-216. doi: 10.1038/s41593-019-0567-0. Epub 2020 Jan 13.

文献AI研究员

20分钟写一篇综述，助力文献阅读效率提升50倍。

立即体验

用中文搜PubMed

大模型驱动的PubMed中文搜索引擎

马上搜索

文档翻译

学术文献翻译模型，支持多种主流文档格式。

立即体验

多巴胺神经元中未来奖励的多维分布图。

A multidimensional distributional map of future reward in dopamine neurons.

作者信息

机构信息

出版信息

相似文献

引用本文的文献

本文引用的文献

文献AI研究员

用中文搜PubMed

文档翻译

Suppr 超能文献

相似文献

引用本文的文献

本文引用的文献