文献检索文档翻译深度研究
Suppr Zotero 插件Zotero 插件
邀请有礼套餐&价格历史记录

新学期,新优惠

限时优惠:9月1日-9月22日

30天高级会员仅需29元

1天体验卡首发特惠仅需5.99元

了解详情
不再提醒
插件&应用
Suppr Zotero 插件Zotero 插件浏览器插件Mac 客户端Windows 客户端微信小程序
高级版
套餐订阅购买积分包
AI 工具
文献检索文档翻译深度研究
关于我们
关于 Suppr公司介绍联系我们用户协议隐私条款
关注我们

Suppr 超能文献

核心技术专利:CN118964589B侵权必究
粤ICP备2023148730 号-1Suppr @ 2025

多巴胺神经元中未来奖励的多维分布图。

A multidimensional distributional map of future reward in dopamine neurons.

作者信息

Sousa Margarida, Bujalski Pawel, Cruz Bruno F, Louie Kenway, McNamee Daniel C, Paton Joseph J

机构信息

Champalimaud Centre for the Unknown, Lisbon, Portugal.

Allen Institute for Neural Dynamics, Seattle, WA, USA.

出版信息

Nature. 2025 Jun;642(8068):691-699. doi: 10.1038/s41586-025-09089-6. Epub 2025 Jun 4.


DOI:10.1038/s41586-025-09089-6
PMID:40468078
Abstract

Midbrain dopamine neurons (DANs) signal reward-prediction errors that teach recipient circuits about expected rewards. However, DANs are thought to provide a substrate for temporal difference (TD) reinforcement learning (RL), an algorithm that learns the mean of temporally discounted expected future rewards, discarding useful information about experienced distributions of reward amounts and delays. Here we present time-magnitude RL (TMRL), a multidimensional variant of distributional RL that learns the joint distribution of future rewards over time and magnitude. We also uncover signatures of TMRL-like computations in the activity of optogenetically identified DANs in mice during behaviour. Specifically, we show that there is significant diversity in both temporal discounting and tuning for the reward magnitude across DANs. These features allow the computation of a two-dimensional, probabilistic map of future rewards from just 450 ms of the DAN population response to a reward-predictive cue. Furthermore, reward-time predictions derived from this code correlate with anticipatory behaviour, suggesting that similar information is used to guide decisions about when to act. Finally, by simulating behaviour in a foraging environment, we highlight the benefits of a joint probability distribution of reward over time and magnitude in the face of dynamic reward landscapes and internal states. These findings show that rich probabilistic reward information is learnt and communicated to DANs, and suggest a simple, local-in-time extension of TD algorithms that explains how such information might be acquired and computed.

摘要

中脑多巴胺能神经元(DANs)发出奖励预测误差信号,向接收回路传授预期奖励的相关信息。然而,DANs被认为是为时间差分(TD)强化学习(RL)提供了一种基础,TD强化学习是一种学习时间折扣预期未来奖励均值的算法,它丢弃了关于奖励数量和延迟的经验分布的有用信息。在此,我们提出了时间-量级强化学习(TMRL),这是一种分布强化学习的多维变体,它学习未来奖励随时间和量级的联合分布。我们还在小鼠行为期间光遗传学识别的DANs的活动中发现了类似TMRL计算的特征。具体而言,我们表明,DANs在时间折扣和奖励量级调谐方面都存在显著差异。这些特征使得仅从DAN群体对奖励预测线索450毫秒的反应中就能计算出未来奖励的二维概率图。此外,从该编码得出的奖励时间预测与预期行为相关,这表明类似的信息被用于指导关于何时行动的决策。最后,通过模拟觅食环境中的行为,我们突出了在面对动态奖励格局和内部状态时,奖励随时间和量级的联合概率分布的益处。这些发现表明,丰富的概率性奖励信息被学习并传递给DANs,并提出了一种TD算法的简单、即时局部扩展,解释了此类信息可能是如何获取和计算的。

相似文献

[1]
A multidimensional distributional map of future reward in dopamine neurons.

Nature. 2025-6

[2]
Multi-timescale reinforcement learning in the brain.

Nature. 2025-6-4

[3]
Adapting Safety Plans for Autistic Adults with Involvement from the Autism Community.

Autism Adulthood. 2025-5-28

[4]
An auditory cortical-striatal circuit supports sound-triggered timing to predict future events.

PLoS Biol. 2025-6-2

[5]
Stigma Management Strategies of Autistic Social Media Users.

Autism Adulthood. 2025-5-28

[6]
Assessing the comparative effects of interventions in COPD: a tutorial on network meta-analysis for clinicians.

Respir Res. 2024-12-21

[7]
A Pilot Study of Political Experiences and Barriers to Voting Among Autistic Adults Participating in Online Survey Research in the United States.

Autism Adulthood. 2025-5-28

[8]
Dual neuromodulatory dynamics underlie birdsong learning.

Nature. 2025-5

[9]
"Just Ask What Support We Need": Autistic Adults' Feedback on Social Skills Training.

Autism Adulthood. 2025-5-28

[10]
Community views on mass drug administration for soil-transmitted helminths: a qualitative evidence synthesis.

Cochrane Database Syst Rev. 2025-6-20

引用本文的文献

[1]
Mesolimbic dopamine ramps reflect environmental timescales.

Elife. 2025-8-29

[2]
Mesolimbic dopamine ramps reflect environmental timescales.

bioRxiv. 2024-4-23

[3]
Learning temporal relationships between symbols with Laplace Neural Manifolds.

ArXiv. 2024-9-22

本文引用的文献

[1]
A feature-specific prediction error model explains dopaminergic heterogeneity.

Nat Neurosci. 2024-8

[2]
Reward prediction error neurons implement an efficient code for reward.

Nat Neurosci. 2024-7

[3]
Distributional coding of associative learning in discrete populations of midbrain dopamine neurons.

Cell Rep. 2024-4-23

[4]
Distributional reinforcement learning in prefrontal cortex.

Nat Neurosci. 2024-3

[5]
Dopaminergic prediction errors in the ventral tegmental area reflect a multithreaded predictive model.

Nat Neurosci. 2023-5

[6]
Mesolimbic dopamine release conveys causal associations.

Science. 2022-12-23

[7]
Asymmetric and adaptive reward coding via normalized reinforcement learning.

PLoS Comput Biol. 2022-7

[8]
Action suppression reveals opponent parallel control via striatal circuits.

Nature. 2022-7

[9]
A distributional code for value in dopamine-based reinforcement learning.

Nature. 2020-1-15

[10]
Temporally restricted dopaminergic control of reward-conditioned movements.

Nat Neurosci. 2020-1-13

文献AI研究员

20分钟写一篇综述,助力文献阅读效率提升50倍

立即体验

用中文搜PubMed

大模型驱动的PubMed中文搜索引擎

马上搜索

推荐工具

医学文档翻译智能文献检索