• 文献检索
  • 文档翻译
  • 深度研究
  • 学术资讯
  • Suppr Zotero 插件Zotero 插件
  • 邀请有礼
  • 套餐&价格
  • 历史记录
应用&插件
Suppr Zotero 插件Zotero 插件浏览器插件Mac 客户端Windows 客户端微信小程序
定价
高级版会员购买积分包购买API积分包
服务
文献检索文档翻译深度研究API 文档MCP 服务
关于我们
关于 Suppr公司介绍联系我们用户协议隐私条款
关注我们

Suppr 超能文献

核心技术专利:CN118964589B侵权必究
粤ICP备2023148730 号-1Suppr @ 2026

文献检索

告别复杂PubMed语法,用中文像聊天一样搜索,搜遍4000万医学文献。AI智能推荐,让科研检索更轻松。

立即免费搜索

文件翻译

保留排版,准确专业,支持PDF/Word/PPT等文件格式,支持 12+语言互译。

免费翻译文档

深度研究

AI帮你快速写综述,25分钟生成高质量综述,智能提取关键信息,辅助科研写作。

立即免费体验

奖励基础:一种用于适应性获取多种奖励类型的简单机制。

Reward Bases: A simple mechanism for adaptive acquisition of multiple reward types.

作者信息

Millidge Beren, Song Yuhang, Lak Armin, Walton Mark E, Bogacz Rafal

机构信息

MRC Brain Network Dynamics Unit, University of Oxford, Oxford, United Kingdom.

Department of Physiology, Anatomy & Genetics, University of Oxford, Oxford, United Kingdom.

出版信息

PLoS Comput Biol. 2024 Nov 19;20(11):e1012580. doi: 10.1371/journal.pcbi.1012580. eCollection 2024 Nov.

DOI:10.1371/journal.pcbi.1012580
PMID:39561186
原文链接:https://pmc.ncbi.nlm.nih.gov/articles/PMC11614280/
Abstract

Animals can adapt their preferences for different types of reward according to physiological state, such as hunger or thirst. To explain this ability, we employ a simple multi-objective reinforcement learning model that learns multiple values according to different reward dimensions such as food or water. We show that by weighting these learned values according to the current needs, behaviour may be flexibly adapted to present preferences. This model predicts that individual dopamine neurons should encode the errors associated with some reward dimensions more than with others. To provide a preliminary test of this prediction, we reanalysed a small dataset obtained from a single primate in an experiment which to our knowledge is the only published study where the responses of dopamine neurons to stimuli predicting distinct types of rewards were recorded. We observed that in addition to subjective economic value, dopamine neurons encode a gradient of reward dimensions; some neurons respond most to stimuli predicting food rewards while the others respond more to stimuli predicting fluids. We also proposed a possible implementation of the model in the basal ganglia network, and demonstrated how the striatal system can learn values in multiple dimensions, even when dopamine neurons encode mixtures of prediction error from different dimensions. Additionally, the model reproduces the instant generalisation to new physiological states seen in dopamine responses and in behaviour. Our results demonstrate how a simple neural circuit can flexibly guide behaviour according to animals' needs.

摘要

动物可以根据生理状态(如饥饿或口渴)来调整对不同类型奖励的偏好。为了解释这种能力,我们采用了一个简单的多目标强化学习模型,该模型根据食物或水等不同奖励维度学习多个值。我们表明,通过根据当前需求对这些学习到的值进行加权,行为可以灵活地适应当前的偏好。该模型预测,单个多巴胺神经元应该更多地编码与某些奖励维度相关的误差,而不是其他维度的误差。为了对这一预测进行初步测试,我们重新分析了一个从一只灵长类动物实验中获得的小数据集,据我们所知,这是唯一一项已发表的研究,其中记录了多巴胺神经元对预测不同类型奖励的刺激的反应。我们观察到,除了主观经济价值外,多巴胺神经元还编码了一个奖励维度的梯度;一些神经元对预测食物奖励的刺激反应最大,而另一些神经元对预测液体奖励的刺激反应更大。我们还提出了该模型在基底神经节网络中的一种可能实现方式,并展示了纹状体系统如何在多个维度学习值,即使多巴胺神经元编码来自不同维度的预测误差混合。此外,该模型再现了多巴胺反应和行为中出现的对新生理状态的即时泛化。我们的结果证明了一个简单的神经回路如何根据动物的需求灵活地指导行为。

https://cdn.ncbi.nlm.nih.gov/pmc/blobs/eab3/11614280/880c16263bcd/pcbi.1012580.g008.jpg
https://cdn.ncbi.nlm.nih.gov/pmc/blobs/eab3/11614280/82e384e58a56/pcbi.1012580.g001.jpg
https://cdn.ncbi.nlm.nih.gov/pmc/blobs/eab3/11614280/96e1c8d14eae/pcbi.1012580.g002.jpg
https://cdn.ncbi.nlm.nih.gov/pmc/blobs/eab3/11614280/36b41a21ac79/pcbi.1012580.g003.jpg
https://cdn.ncbi.nlm.nih.gov/pmc/blobs/eab3/11614280/5f3904c8df3d/pcbi.1012580.g004.jpg
https://cdn.ncbi.nlm.nih.gov/pmc/blobs/eab3/11614280/2464faafedcd/pcbi.1012580.g005.jpg
https://cdn.ncbi.nlm.nih.gov/pmc/blobs/eab3/11614280/65a688aeecda/pcbi.1012580.g006.jpg
https://cdn.ncbi.nlm.nih.gov/pmc/blobs/eab3/11614280/9031c1b92d3a/pcbi.1012580.g007.jpg
https://cdn.ncbi.nlm.nih.gov/pmc/blobs/eab3/11614280/880c16263bcd/pcbi.1012580.g008.jpg
https://cdn.ncbi.nlm.nih.gov/pmc/blobs/eab3/11614280/82e384e58a56/pcbi.1012580.g001.jpg
https://cdn.ncbi.nlm.nih.gov/pmc/blobs/eab3/11614280/96e1c8d14eae/pcbi.1012580.g002.jpg
https://cdn.ncbi.nlm.nih.gov/pmc/blobs/eab3/11614280/36b41a21ac79/pcbi.1012580.g003.jpg
https://cdn.ncbi.nlm.nih.gov/pmc/blobs/eab3/11614280/5f3904c8df3d/pcbi.1012580.g004.jpg
https://cdn.ncbi.nlm.nih.gov/pmc/blobs/eab3/11614280/2464faafedcd/pcbi.1012580.g005.jpg
https://cdn.ncbi.nlm.nih.gov/pmc/blobs/eab3/11614280/65a688aeecda/pcbi.1012580.g006.jpg
https://cdn.ncbi.nlm.nih.gov/pmc/blobs/eab3/11614280/9031c1b92d3a/pcbi.1012580.g007.jpg
https://cdn.ncbi.nlm.nih.gov/pmc/blobs/eab3/11614280/880c16263bcd/pcbi.1012580.g008.jpg

相似文献

1
Reward Bases: A simple mechanism for adaptive acquisition of multiple reward types.奖励基础:一种用于适应性获取多种奖励类型的简单机制。
PLoS Comput Biol. 2024 Nov 19;20(11):e1012580. doi: 10.1371/journal.pcbi.1012580. eCollection 2024 Nov.
2
A neural network model with dopamine-like reinforcement signal that learns a spatial delayed response task.一种具有类似多巴胺强化信号的神经网络模型,用于学习空间延迟反应任务。
Neuroscience. 1999;91(3):871-90. doi: 10.1016/s0306-4522(98)00697-6.
3
Uncertainty-guided learning with scaled prediction errors in the basal ganglia.纹状体中基于预测误差缩放的不确定性引导学习。
PLoS Comput Biol. 2022 May 27;18(5):e1009816. doi: 10.1371/journal.pcbi.1009816. eCollection 2022 May.
4
Involvement of basal ganglia and orbitofrontal cortex in goal-directed behavior.基底神经节和眶额皮质在目标导向行为中的参与。
Prog Brain Res. 2000;126:193-215. doi: 10.1016/S0079-6123(00)26015-9.
5
Dopamine role in learning and action inference.多巴胺在学习和行动推断中的作用。
Elife. 2020 Jul 7;9:e53262. doi: 10.7554/eLife.53262.
6
Modeling the effects of motivation on choice and learning in the basal ganglia.在基底神经节中建模动机对选择和学习的影响。
PLoS Comput Biol. 2020 May 26;16(5):e1007465. doi: 10.1371/journal.pcbi.1007465. eCollection 2020 May.
7
Dopamine encoding of novelty facilitates efficient uncertainty-driven exploration.多巴胺对新颖性的编码促进了高效的不确定性驱动探索。
PLoS Comput Biol. 2024 Apr 16;20(4):e1011516. doi: 10.1371/journal.pcbi.1011516. eCollection 2024 Apr.
8
Striatal dopamine ramping may indicate flexible reinforcement learning with forgetting in the cortico-basal ganglia circuits.纹状体多巴胺爬坡可能表明皮质基底神经节回路具有灵活的强化学习和遗忘能力。
Front Neural Circuits. 2014 Apr 9;8:36. doi: 10.3389/fncir.2014.00036. eCollection 2014.
9
Dopamine prediction error responses integrate subjective value from different reward dimensions.多巴胺预测误差反应整合了来自不同奖励维度的主观价值。
Proc Natl Acad Sci U S A. 2014 Feb 11;111(6):2343-8. doi: 10.1073/pnas.1321596111. Epub 2014 Jan 22.
10
Reward functions of the basal ganglia.基底神经节的奖赏功能。
J Neural Transm (Vienna). 2016 Jul;123(7):679-693. doi: 10.1007/s00702-016-1510-0. Epub 2016 Feb 2.

引用本文的文献

1
A decision-space model explains context-specific decision-making.决策空间模型解释特定情境下的决策制定。
Nat Commun. 2025 Aug 14;16(1):7437. doi: 10.1038/s41467-025-61466-x.
2
Striatal dopamine signals errors in prediction across different informational domains.纹状体多巴胺信号在不同信息领域中预测误差。
Sci Adv. 2025 Jul 11;11(28):eadq9684. doi: 10.1126/sciadv.adq9684. Epub 2025 Jul 9.
3
Multi-timescale reinforcement learning in the brain.大脑中的多时间尺度强化学习。

本文引用的文献

1
A feature-specific prediction error model explains dopaminergic heterogeneity.一种具有特征特异性的预测误差模型解释了多巴胺能异质性。
Nat Neurosci. 2024 Aug;27(8):1574-1586. doi: 10.1038/s41593-024-01689-1. Epub 2024 Jul 3.
2
Overlapping representations of food and social stimuli in mouse VTA dopamine neurons.小鼠 VTA 多巴胺神经元中食物和社交刺激的重叠表征。
Neuron. 2023 Nov 15;111(22):3541-3553.e8. doi: 10.1016/j.neuron.2023.08.003. Epub 2023 Aug 31.
3
Feasibility of dopamine as a vector-valued feedback signal in the basal ganglia.
Nature. 2025 Jun 4. doi: 10.1038/s41586-025-08929-9.
4
Prospective contingency explains behavior and dopamine signals during associative learning.前瞻性偶然性解释了联想学习过程中的行为和多巴胺信号。
Nat Neurosci. 2025 Mar 18. doi: 10.1038/s41593-025-01915-4.
5
The devilish details affecting TDRL models in dopamine research.多巴胺研究中影响临时残疾评定量表(TDRL)模型的棘手细节。
Trends Cogn Sci. 2025 May;29(5):434-447. doi: 10.1016/j.tics.2025.02.001. Epub 2025 Feb 26.
6
The curious case of dopaminergic prediction errors and learning associative information beyond value.多巴胺能预测误差与学习价值以外的关联信息的奇妙案例。
Nat Rev Neurosci. 2025 Mar;26(3):169-178. doi: 10.1038/s41583-024-00898-8. Epub 2025 Jan 8.
7
Dopaminergic responses to identity prediction errors depend differently on the orbitofrontal cortex and hippocampus.多巴胺能对身份预测误差的反应在眶额皮质和海马体上的依赖方式有所不同。
bioRxiv. 2024 Dec 17:2024.12.11.628003. doi: 10.1101/2024.12.11.628003.
多巴胺作为基底神经节中向量值反馈信号的可行性。
Proc Natl Acad Sci U S A. 2023 Aug 8;120(32):e2221994120. doi: 10.1073/pnas.2221994120. Epub 2023 Aug 1.
4
Having multiple selves helps learning agents explore and adapt in complex changing worlds.拥有多个自我有助于学习代理在复杂多变的世界中探索和适应。
Proc Natl Acad Sci U S A. 2023 Jul 11;120(28):e2221180120. doi: 10.1073/pnas.2221180120. Epub 2023 Jul 3.
5
Dopaminergic prediction errors in the ventral tegmental area reflect a multithreaded predictive model.腹侧被盖区的多巴胺预测误差反映了一个多线程的预测模型。
Nat Neurosci. 2023 May;26(5):830-839. doi: 10.1038/s41593-023-01310-x. Epub 2023 Apr 20.
6
Homeostatic Reinforcement Theory Accounts for Sodium Appetitive State- and Taste-Dependent Dopamine Responding.稳态强化理论解释了钠的食欲状态和味觉依赖性多巴胺反应。
Nutrients. 2023 Feb 17;15(4):1015. doi: 10.3390/nu15041015.
7
Nutrient-Sensitive Reinforcement Learning in Monkeys.猴子的营养敏感强化学习。
J Neurosci. 2023 Mar 8;43(10):1714-1730. doi: 10.1523/JNEUROSCI.0752-22.2022. Epub 2023 Jan 20.
8
Dopamine subsystems that track internal states.追踪内部状态的多巴胺子系统。
Nature. 2022 Aug;608(7922):374-380. doi: 10.1038/s41586-022-04954-0. Epub 2022 Jul 13.
9
Hunger improves reinforcement-driven but not planned action.饥饿改善了强化驱动但不是计划好的行为。
Cogn Affect Behav Neurosci. 2021 Dec;21(6):1196-1206. doi: 10.3758/s13415-021-00921-w. Epub 2021 Oct 15.
10
A neuronal mechanism controlling the choice between feeding and sexual behaviors in Drosophila.一种控制果蝇摄食行为和性行为之间选择的神经元机制。
Curr Biol. 2021 Oct 11;31(19):4231-4245.e4. doi: 10.1016/j.cub.2021.07.029. Epub 2021 Aug 5.