• 文献检索
  • 文档翻译
  • 深度研究
  • 学术资讯
  • Suppr Zotero 插件Zotero 插件
  • 邀请有礼
  • 套餐&价格
  • 历史记录
应用&插件
Suppr Zotero 插件Zotero 插件浏览器插件Mac 客户端Windows 客户端微信小程序
定价
高级版会员购买积分包购买API积分包
服务
文献检索文档翻译深度研究API 文档MCP 服务
关于我们
关于 Suppr公司介绍联系我们用户协议隐私条款
关注我们

Suppr 超能文献

核心技术专利:CN118964589B侵权必究
粤ICP备2023148730 号-1Suppr @ 2026

文献检索

告别复杂PubMed语法,用中文像聊天一样搜索,搜遍4000万医学文献。AI智能推荐,让科研检索更轻松。

立即免费搜索

文件翻译

保留排版,准确专业,支持PDF/Word/PPT等文件格式,支持 12+语言互译。

免费翻译文档

深度研究

AI帮你快速写综述,25分钟生成高质量综述,智能提取关键信息,辅助科研写作。

立即免费体验

正性和负性结果的差异学习率的适应性特性。

Adaptive properties of differential learning rates for positive and negative outcomes.

作者信息

Cazé Romain D, van der Meer Matthijs A A

机构信息

Department of Bioengineering, Imperial College, London, UK,

出版信息

Biol Cybern. 2013 Dec;107(6):711-9. doi: 10.1007/s00422-013-0571-5. Epub 2013 Oct 2.

DOI:10.1007/s00422-013-0571-5
PMID:24085507
Abstract

The concept of the reward prediction error-the difference between reward obtained and reward predicted-continues to be a focal point for much theoretical and experimental work in psychology, cognitive science, and neuroscience. Models that rely on reward prediction errors typically assume a single learning rate for positive and negative prediction errors. However, behavioral data indicate that better-than-expected and worse-than-expected outcomes often do not have symmetric impacts on learning and decision-making. Furthermore, distinct circuits within cortico-striatal loops appear to support learning from positive and negative prediction errors, respectively. Such differential learning rates would be expected to lead to biased reward predictions and therefore suboptimal choice performance. Contrary to this intuition, we show that on static "bandit" choice tasks, differential learning rates can be adaptive. This occurs because asymmetric learning enables a better separation of learned reward probabilities. We show analytically how the optimal learning rate asymmetry depends on the reward distribution and implement a biologically plausible algorithm that adapts the balance of positive and negative learning rates from experience. These results suggest specific adaptive advantages for separate, differential learning rates in simple reinforcement learning settings and provide a novel, normative perspective on the interpretation of associated neural data.

摘要

奖励预测误差的概念——即获得的奖励与预测的奖励之间的差异——仍然是心理学、认知科学和神经科学中许多理论和实验工作的焦点。依赖奖励预测误差的模型通常对正向和负向预测误差采用单一学习率。然而,行为数据表明,优于预期和差于预期的结果对学习和决策的影响往往并不对称。此外,皮质-纹状体回路中的不同神经回路似乎分别支持从正向和负向预测误差中学习。这种不同的学习率预计会导致奖励预测出现偏差,从而导致选择性能次优。与这种直觉相反,我们表明在静态“老虎机”选择任务中,不同的学习率可以是适应性的。之所以会这样,是因为非对称学习能够更好地区分所学的奖励概率。我们通过分析展示了最优学习率不对称性如何取决于奖励分布,并实现了一种生物学上合理的算法,该算法根据经验调整正向和负向学习率的平衡。这些结果表明在简单强化学习环境中,不同的学习率具有特定的适应性优势,并为相关神经数据的解释提供了一种新颖的规范性视角。

相似文献

1
Adaptive properties of differential learning rates for positive and negative outcomes.正性和负性结果的差异学习率的适应性特性。
Biol Cybern. 2013 Dec;107(6):711-9. doi: 10.1007/s00422-013-0571-5. Epub 2013 Oct 2.
2
Do learning rates adapt to the distribution of rewards?学习率会适应奖励的分布吗?
Psychon Bull Rev. 2015 Oct;22(5):1320-7. doi: 10.3758/s13423-014-0790-3.
3
Heterarchical reinforcement-learning model for integration of multiple cortico-striatal loops: fMRI examination in stimulus-action-reward association learning.用于整合多个皮质-纹状体环路的异层级强化学习模型:刺激-动作-奖励关联学习中的功能磁共振成像检查
Neural Netw. 2006 Oct;19(8):1242-54. doi: 10.1016/j.neunet.2006.06.007. Epub 2006 Sep 20.
4
Neural correlates of risk prediction error during reinforcement learning in humans.人类强化学习过程中风险预测误差的神经关联
Neuroimage. 2009 Oct 1;47(4):1929-39. doi: 10.1016/j.neuroimage.2009.04.096. Epub 2009 May 13.
5
Mechanisms of reinforcement learning and decision making in the primate dorsolateral prefrontal cortex.灵长类动物背外侧前额叶皮层中的强化学习与决策机制。
Ann N Y Acad Sci. 2007 May;1104:108-22. doi: 10.1196/annals.1390.007. Epub 2007 Mar 8.
6
Medial prefrontal cortex and the adaptive regulation of reinforcement learning parameters.内侧前额叶皮质与强化学习参数的适应性调节。
Prog Brain Res. 2013;202:441-64. doi: 10.1016/B978-0-444-62604-2.00022-8.
7
Posterior weighted reinforcement learning with state uncertainty.具有状态不确定性的后加权强化学习。
Neural Comput. 2010 May;22(5):1149-79. doi: 10.1162/neco.2010.01-09-948.
8
Reward-dependent learning in neuronal networks for planning and decision making.用于规划和决策的神经网络中基于奖励的学习。
Prog Brain Res. 2000;126:217-29. doi: 10.1016/S0079-6123(00)26016-0.
9
Model-based reinforcement learning under concurrent schedules of reinforcement in rodents.啮齿动物在并发强化程序下基于模型的强化学习
Learn Mem. 2009 Apr 29;16(5):315-23. doi: 10.1101/lm.1295509. Print 2009 May.
10
The role of prediction and outcomes in adaptive cognitive control.预测与结果在适应性认知控制中的作用。
J Physiol Paris. 2015 Feb-Jun;109(1-3):38-52. doi: 10.1016/j.jphysparis.2015.02.001. Epub 2015 Feb 17.

引用本文的文献

1
Uncertainty and reward histories have distinct effects on decisions after wins and losses.不确定性和奖励历史对输赢后的决策有不同影响。
bioRxiv. 2025 Aug 19:2025.08.14.670176. doi: 10.1101/2025.08.14.670176.
2
Diminished reward circuit response underlies pain avoidance learning deficits in problem drinkers.奖赏回路反应减弱是问题饮酒者疼痛回避学习缺陷的潜在原因。
Neuroimage Clin. 2025;45:103762. doi: 10.1016/j.nicl.2025.103762. Epub 2025 Feb 25.
3
Understanding learning through uncertainty and bias.通过不确定性和偏差来理解学习。
Commun Psychol. 2025 Feb 13;3(1):24. doi: 10.1038/s44271-025-00203-y.
4
Altered trial-to-trial responses to reward outcomes in KCNMA1 knockout mice during probabilistic learning tasks.在概率性学习任务期间,KCNMA1基因敲除小鼠对奖励结果的逐次试验反应发生改变。
Behav Brain Funct. 2024 Dec 28;20(1):36. doi: 10.1186/s12993-024-00262-x.
5
Contributions of Attention to Learning in Multidimensional Reward Environments.在多维奖励环境中注意力对学习的贡献。
J Neurosci. 2025 Feb 12;45(7):e2300232024. doi: 10.1523/JNEUROSCI.2300-23.2024.
6
Moderate confirmation bias enhances decision-making in groups of reinforcement-learning agents.适度的确认偏差会增强强化学习智能体群体中的决策能力。
PLoS Comput Biol. 2024 Sep 4;20(9):e1012404. doi: 10.1371/journal.pcbi.1012404. eCollection 2024 Sep.
7
Decision-Making, Pro-variance Biases and Mood-Related Traits.决策、亲方差偏差与情绪相关特质
Comput Psychiatr. 2024 Aug 21;8(1):142-158. doi: 10.5334/cpsy.114. eCollection 2024.
8
Neural correlates of proactive avoidance deficits and alcohol use motives in problem drinking.问题饮酒者主动回避缺陷和酒精使用动机的神经相关性。
Transl Psychiatry. 2024 Aug 21;14(1):336. doi: 10.1038/s41398-024-03039-y.
9
Risk preference as an outcome of evolutionarily adaptive learning mechanisms: An evolutionary simulation under diverse risky environments.风险偏好作为进化适应性学习机制的结果:多样化风险环境下的进化模拟
PLoS One. 2024 Aug 1;19(8):e0307991. doi: 10.1371/journal.pone.0307991. eCollection 2024.
10
A Competition of Critics in Human Decision-Making.人类决策中的批评者竞争
Comput Psychiatr. 2021 Aug 12;5(1):81-101. doi: 10.5334/cpsy.64. eCollection 2021.