Suppr超能文献

多巴胺基强化学习中的价值分布代码。

A distributional code for value in dopamine-based reinforcement learning.

机构信息

DeepMind, London, UK.

Max Planck UCL Centre for Computational Psychiatry and Ageing Research, University College London, London, UK.

出版信息

Nature. 2020 Jan;577(7792):671-675. doi: 10.1038/s41586-019-1924-6. Epub 2020 Jan 15.

Abstract

Since its introduction, the reward prediction error theory of dopamine has explained a wealth of empirical phenomena, providing a unifying framework for understanding the representation of reward and value in the brain. According to the now canonical theory, reward predictions are represented as a single scalar quantity, which supports learning about the expectation, or mean, of stochastic outcomes. Here we propose an account of dopamine-based reinforcement learning inspired by recent artificial intelligence research on distributional reinforcement learning. We hypothesized that the brain represents possible future rewards not as a single mean, but instead as a probability distribution, effectively representing multiple future outcomes simultaneously and in parallel. This idea implies a set of empirical predictions, which we tested using single-unit recordings from mouse ventral tegmental area. Our findings provide strong evidence for a neural realization of distributional reinforcement learning.

摘要

自提出以来,多巴胺的奖励预测误差理论解释了大量的经验现象,为理解大脑中奖励和价值的表示提供了一个统一的框架。根据现在的规范理论,奖励预测被表示为一个单一的标量量,支持对随机结果的期望或均值的学习。在这里,我们提出了一种基于多巴胺的强化学习的解释,这是受最近关于分布强化学习的人工智能研究的启发。我们假设大脑不是以单一的平均值,而是以概率分布的形式来表示可能的未来奖励,有效地同时并行地表示多个未来结果。这个想法暗示了一系列经验预测,我们使用来自小鼠腹侧被盖区的单个单元记录来测试这些预测。我们的发现为分布强化学习的神经实现提供了强有力的证据。

相似文献

1
A distributional code for value in dopamine-based reinforcement learning.
Nature. 2020 Jan;577(7792):671-675. doi: 10.1038/s41586-019-1924-6. Epub 2020 Jan 15.
2
Arithmetic and local circuitry underlying dopamine prediction errors.
Nature. 2015 Sep 10;525(7568):243-6. doi: 10.1038/nature14855. Epub 2015 Aug 31.
4
Systems Neuroscience: Shaping the Reward Prediction Error Signal.
Curr Biol. 2015 Nov 16;25(22):R1081-4. doi: 10.1016/j.cub.2015.09.057.
5
Neuron-type-specific signals for reward and punishment in the ventral tegmental area.
Nature. 2012 Jan 18;482(7383):85-8. doi: 10.1038/nature10754.
6
Optogenetic Blockade of Dopamine Transients Prevents Learning Induced by Changes in Reward Features.
Curr Biol. 2017 Nov 20;27(22):3480-3486.e3. doi: 10.1016/j.cub.2017.09.049. Epub 2017 Nov 2.
7
A feature-specific prediction error model explains dopaminergic heterogeneity.
Nat Neurosci. 2024 Aug;27(8):1574-1586. doi: 10.1038/s41593-024-01689-1. Epub 2024 Jul 3.
8
Ventral Tegmental Dopamine Neurons Participate in Reward Identity Predictions.
Curr Biol. 2019 Jan 7;29(1):93-103.e3. doi: 10.1016/j.cub.2018.11.050. Epub 2018 Dec 20.
9
Dopaminergic prediction errors in the ventral tegmental area reflect a multithreaded predictive model.
Nat Neurosci. 2023 May;26(5):830-839. doi: 10.1038/s41593-023-01310-x. Epub 2023 Apr 20.
10
Neuronal implementation of the temporal difference learning algorithm in the midbrain dopaminergic system.
Proc Natl Acad Sci U S A. 2023 Nov 7;120(45):e2309015120. doi: 10.1073/pnas.2309015120. Epub 2023 Oct 30.

引用本文的文献

1
Correctness is its own reward: bootstrapping error signals in self-guided reinforcement learning.
bioRxiv. 2025 Aug 19:2025.07.18.665446. doi: 10.1101/2025.07.18.665446.
4
Experience-based risk taking is primarily shaped by prior learning rather than by decision-making.
Nat Commun. 2025 Jul 9;16(1):6310. doi: 10.1038/s41467-025-61609-0.
5
The interoceptive origin of reinforcement learning.
Trends Cogn Sci. 2025 Sep;29(9):840-854. doi: 10.1016/j.tics.2025.05.008. Epub 2025 Jun 10.
6
A multidimensional distributional map of future reward in dopamine neurons.
Nature. 2025 Jun;642(8068):691-699. doi: 10.1038/s41586-025-09089-6. Epub 2025 Jun 4.
7
Multi-timescale reinforcement learning in the brain.
Nature. 2025 Jun 4. doi: 10.1038/s41586-025-08929-9.
10
Early versus late noise differentially enhances or degrades context-dependent choice.
Nat Commun. 2025 Apr 23;16(1):3828. doi: 10.1038/s41467-025-59140-3.

文献AI研究员

20分钟写一篇综述,助力文献阅读效率提升50倍。

立即体验

用中文搜PubMed

大模型驱动的PubMed中文搜索引擎

马上搜索

文档翻译

学术文献翻译模型,支持多种主流文档格式。

立即体验