纹状体中基于预测误差缩放的不确定性引导学习。

Uncertainty-guided learning with scaled prediction errors in the basal ganglia.

机构信息

Nuffield Department of Clinical Neurosciences, University of Oxford, Oxford, United Kingdom.

Department of Experimental Psychology, University of Oxford, Oxford, United Kingdom.

出版信息

PLoS Comput Biol. 2022 May 27;18(5):e1009816. doi: 10.1371/journal.pcbi.1009816. eCollection 2022 May.

DOI:10.1371/journal.pcbi.1009816

PMID:35622863

原文链接:https://pmc.ncbi.nlm.nih.gov/articles/PMC9182698/

Abstract

To accurately predict rewards associated with states or actions, the variability of observations has to be taken into account. In particular, when the observations are noisy, the individual rewards should have less influence on tracking of average reward, and the estimate of the mean reward should be updated to a smaller extent after each observation. However, it is not known how the magnitude of the observation noise might be tracked and used to control prediction updates in the brain reward system. Here, we introduce a new model that uses simple, tractable learning rules that track the mean and standard deviation of reward, and leverages prediction errors scaled by uncertainty as the central feedback signal. We show that the new model has an advantage over conventional reinforcement learning models in a value tracking task, and approaches a theoretic limit of performance provided by the Kalman filter. Further, we propose a possible biological implementation of the model in the basal ganglia circuit. In the proposed network, dopaminergic neurons encode reward prediction errors scaled by standard deviation of rewards. We show that such scaling may arise if the striatal neurons learn the standard deviation of rewards and modulate the activity of dopaminergic neurons. The model is consistent with experimental findings concerning dopamine prediction error scaling relative to reward magnitude, and with many features of striatal plasticity. Our results span across the levels of implementation, algorithm, and computation, and might have important implications for understanding the dopaminergic prediction error signal and its relation to adaptive and effective learning.

摘要

为了准确预测与状态或动作相关的奖励，必须考虑到观测值的可变性。特别是，当观测值存在噪声时，个体奖励对平均奖励的跟踪的影响应该较小，并且每次观测后，对平均奖励的估计应该以较小的程度进行更新。然而，目前尚不清楚如何跟踪观测噪声的大小，并在大脑奖励系统中用于控制预测更新。在这里，我们引入了一种新的模型，该模型使用简单、可处理的学习规则来跟踪奖励的平均值和标准差，并利用由不确定性缩放的预测误差作为中央反馈信号。我们表明，新模型在价值跟踪任务中优于传统的强化学习模型，并接近卡尔曼滤波器提供的性能理论极限。此外，我们提出了基底神经节回路中模型的一种可能的生物学实现。在提出的网络中，多巴胺能神经元编码由奖励标准差缩放的奖励预测误差。我们表明，如果纹状体神经元学习奖励的标准差并调节多巴胺能神经元的活动，则可能会出现这种缩放。该模型与关于多巴胺预测误差相对于奖励幅度的缩放的实验结果一致，并且与纹状体可塑性的许多特征一致。我们的结果跨越了实现、算法和计算的各个层面，对于理解多巴胺能预测误差信号及其与适应性和有效学习的关系可能具有重要意义。

https://cdn.ncbi.nlm.nih.gov/pmc/blobs/e875/9182698/a0701f741368/pcbi.1009816.g001.jpg

相似文献

Uncertainty-guided learning with scaled prediction errors in the basal ganglia.纹状体中基于预测误差缩放的不确定性引导学习。

PLoS Comput Biol. 2022 May 27;18(5):e1009816. doi: 10.1371/journal.pcbi.1009816. eCollection 2022 May.

Reward Bases: A simple mechanism for adaptive acquisition of multiple reward types.奖励基础：一种用于适应性获取多种奖励类型的简单机制。

PLoS Comput Biol. 2024 Nov 19;20(11):e1012580. doi: 10.1371/journal.pcbi.1012580. eCollection 2024 Nov.

Involvement of basal ganglia and orbitofrontal cortex in goal-directed behavior.基底神经节和眶额皮质在目标导向行为中的参与。

Prog Brain Res. 2000;126:193-215. doi: 10.1016/S0079-6123(00)26015-9.

Striatal dopamine ramping may indicate flexible reinforcement learning with forgetting in the cortico-basal ganglia circuits.纹状体多巴胺爬坡可能表明皮质基底神经节回路具有灵活的强化学习和遗忘能力。

Front Neural Circuits. 2014 Apr 9;8:36. doi: 10.3389/fncir.2014.00036. eCollection 2014.

A neural network model with dopamine-like reinforcement signal that learns a spatial delayed response task.一种具有类似多巴胺强化信号的神经网络模型，用于学习空间延迟反应任务。

Neuroscience. 1999;91(3):871-90. doi: 10.1016/s0306-4522(98)00697-6.

Learning Reward Uncertainty in the Basal Ganglia.在基底神经节中学习奖励不确定性

PLoS Comput Biol. 2016 Sep 2;12(9):e1005062. doi: 10.1371/journal.pcbi.1005062. eCollection 2016 Sep.

Dopamine role in learning and action inference.多巴胺在学习和行动推断中的作用。

Elife. 2020 Jul 7;9:e53262. doi: 10.7554/eLife.53262.

Anticipatory reward signals in ventral striatal neurons of behaving rats.行为大鼠腹侧纹状体神经元中的预期奖励信号。

Eur J Neurosci. 2008 Nov;28(9):1849-66. doi: 10.1111/j.1460-9568.2008.06480.x.

Differential magnitude coding of gains and omitted rewards in the ventral striatum.腹侧纹状体中收益和缺失奖励的差异幅度编码。

Brain Res. 2011 Sep 9;1411:76-86. doi: 10.1016/j.brainres.2011.07.019. Epub 2011 Jul 18.

Dopamine errors drive excitatory and inhibitory components of backward conditioning in an outcome-specific manner.多巴胺错误以特定于结果的方式驱动反向条件作用的兴奋性和抑制性成分。

Curr Biol. 2022 Jul 25;32(14):3210-3218.e3. doi: 10.1016/j.cub.2022.06.035. Epub 2022 Jun 24.

引用本文的文献

A decision-space model explains context-specific decision-making.一种决策空间模型解释了特定情境下的决策制定。

Res Sq. 2024 Dec 3:rs.3.rs-5499511. doi: 10.21203/rs.3.rs-5499511/v1.

Explaining dopamine through prediction errors and beyond.通过预测误差解释多巴胺及其他。

Nat Neurosci. 2024 Sep;27(9):1645-1655. doi: 10.1038/s41593-024-01705-4. Epub 2024 Jul 25.

Dopamine encoding of novelty facilitates efficient uncertainty-driven exploration.多巴胺对新颖性的编码促进了高效的不确定性驱动探索。

PLoS Comput Biol. 2024 Apr 16;20(4):e1011516. doi: 10.1371/journal.pcbi.1011516. eCollection 2024 Apr.

Predictive coding networks for temporal prediction.用于时间预测的预测编码网络。

PLoS Comput Biol. 2024 Apr 1;20(4):e1011183. doi: 10.1371/journal.pcbi.1011183. eCollection 2024 Apr.

本文引用的文献

The dopamine circuit as a reward-taxis navigation system.多巴胺回路作为一种奖励-出租车导航系统。

PLoS Comput Biol. 2022 Jul 25;18(7):e1010340. doi: 10.1371/journal.pcbi.1010340. eCollection 2022 Jul.

An association between prediction errors and risk-seeking: Theory and behavioral evidence.预测误差与风险寻求之间的关联：理论与行为证据。

PLoS Comput Biol. 2021 Jul 16;17(7):e1009213. doi: 10.1371/journal.pcbi.1009213. eCollection 2021 Jul.

Rare rewards amplify dopamine responses.稀有奖励会放大多巴胺反应。

Nat Neurosci. 2021 Apr;24(4):465-469. doi: 10.1038/s41593-021-00807-7. Epub 2021 Mar 8.

Dopamine role in learning and action inference.多巴胺在学习和行动推断中的作用。

Elife. 2020 Jul 7;9:e53262. doi: 10.7554/eLife.53262.

A simple model for learning in volatile environments.在不稳定环境中学习的一种简单模型。

PLoS Comput Biol. 2020 Jul 1;16(7):e1007963. doi: 10.1371/journal.pcbi.1007963. eCollection 2020 Jul.

Precision weighting of cortical unsigned prediction error signals benefits learning, is mediated by dopamine, and is impaired in psychosis.精确加权皮质无符号预测误差信号有益于学习，由多巴胺介导，并且在精神疾病中受损。

Mol Psychiatry. 2021 Sep;26(9):5320-5333. doi: 10.1038/s41380-020-0803-8. Epub 2020 Jun 24.

Dopaminergic Transmission Rapidly and Persistently Enhances Excitability of D1 Receptor-Expressing Striatal Projection Neurons.多巴胺能传递快速且持久地增强 D1 受体表达的纹状体投射神经元的兴奋性。

Neuron. 2020 Apr 22;106(2):277-290.e6. doi: 10.1016/j.neuron.2020.01.028. Epub 2020 Feb 18.

A distributional code for value in dopamine-based reinforcement learning.多巴胺基强化学习中的价值分布代码。

Nature. 2020 Jan;577(7792):671-675. doi: 10.1038/s41586-019-1924-6. Epub 2020 Jan 15.

Effects of reward size and context on learning in macaque monkeys.奖励大小和情境对猕猴学习的影响。

Behav Brain Res. 2019 Oct 17;372:111983. doi: 10.1016/j.bbr.2019.111983. Epub 2019 May 26.

Learning the payoffs and costs of actions.学习行为的收益和成本。

PLoS Comput Biol. 2019 Feb 28;15(2):e1006285. doi: 10.1371/journal.pcbi.1006285. eCollection 2019 Feb.

文献AI研究员

20分钟写一篇综述，助力文献阅读效率提升50倍。

立即体验

用中文搜PubMed

大模型驱动的PubMed中文搜索引擎

马上搜索

文档翻译

学术文献翻译模型，支持多种主流文档格式。

立即体验

纹状体中基于预测误差缩放的不确定性引导学习。

Uncertainty-guided learning with scaled prediction errors in the basal ganglia.

机构信息

出版信息

相似文献

引用本文的文献

本文引用的文献

文献AI研究员

用中文搜PubMed

文档翻译

Suppr 超能文献

相似文献

引用本文的文献

本文引用的文献