内在奖励解释了强化学习中的情境敏感估值。

Intrinsic rewards explain context-sensitive valuation in reinforcement learning.

机构信息

Department of Psychology, University of California, Berkeley, Berkeley, California, United States of America.

Helen Wills Neuroscience Institute, University of California, Berkeley, Berkeley, California, United States of America.

出版信息

PLoS Biol. 2023 Jul 17;21(7):e3002201. doi: 10.1371/journal.pbio.3002201. eCollection 2023 Jul.

DOI:10.1371/journal.pbio.3002201

PMID:37459394

原文链接:https://pmc.ncbi.nlm.nih.gov/articles/PMC10374061/

Abstract

When observing the outcome of a choice, people are sensitive to the choice's context, such that the experienced value of an option depends on the alternatives: getting $1 when the possibilities were 0 or 1 feels much better than when the possibilities were 1 or 10. Context-sensitive valuation has been documented within reinforcement learning (RL) tasks, in which values are learned from experience through trial and error. Range adaptation, wherein options are rescaled according to the range of values yielded by available options, has been proposed to account for this phenomenon. However, we propose that other mechanisms-reflecting a different theoretical viewpoint-may also explain this phenomenon. Specifically, we theorize that internally defined goals play a crucial role in shaping the subjective value attributed to any given option. Motivated by this theory, we develop a new "intrinsically enhanced" RL model, which combines extrinsically provided rewards with internally generated signals of goal achievement as a teaching signal. Across 7 different studies (including previously published data sets as well as a novel, preregistered experiment with replication and control studies), we show that the intrinsically enhanced model can explain context-sensitive valuation as well as, or better than, range adaptation. Our findings indicate a more prominent role of intrinsic, goal-dependent rewards than previously recognized within formal models of human RL. By integrating internally generated signals of reward, standard RL theories should better account for human behavior, including context-sensitive valuation and beyond.

摘要

当观察一个选择的结果时，人们对选择的背景很敏感，以至于一个选项的体验价值取决于可供选择的选项：当可能性为 0 或 1 时获得 1 美元的感觉要好得多，而当可能性为 1 或 10 时则要好得多。强化学习 (RL) 任务中已经记录了基于上下文的估值，其中通过反复试验从经验中学习价值。范围自适应（range adaptation），其中根据可用选项产生的价值范围对选项进行重新缩放，已被提议用于解释这种现象。然而，我们提出，其他机制——反映了不同的理论观点——也可能解释这种现象。具体来说，我们推测内部定义的目标在塑造赋予任何给定选项的主观价值方面起着关键作用。受此理论的启发，我们开发了一种新的“内在增强”RL 模型，该模型将外部提供的奖励与内部产生的目标达成信号作为教学信号相结合。在 7 项不同的研究（包括以前发表的数据以及一项具有复制和对照研究的新颖预注册实验）中，我们表明，内在增强模型可以解释基于上下文的估值，以及范围自适应，或者更好。我们的发现表明，内在的、依赖于目标的奖励在人类 RL 的正式模型中比以前认识到的更为重要。通过整合内部产生的奖励信号，标准的 RL 理论应该更好地解释人类行为，包括基于上下文的估值等。

https://cdn.ncbi.nlm.nih.gov/pmc/blobs/d994/10374061/f3827c2a14de/pbio.3002201.g001.jpg

相似文献

Intrinsic rewards explain context-sensitive valuation in reinforcement learning.内在奖励解释了强化学习中的情境敏感估值。

PLoS Biol. 2023 Jul 17;21(7):e3002201. doi: 10.1371/journal.pbio.3002201. eCollection 2023 Jul.

Nutrient-Sensitive Reinforcement Learning in Monkeys.猴子的营养敏感强化学习。

J Neurosci. 2023 Mar 8;43(10):1714-1730. doi: 10.1523/JNEUROSCI.0752-22.2022. Epub 2023 Jan 20.

Neuro-Inspired Reinforcement Learning to Improve Trajectory Prediction in Reward-Guided Behavior.神经启发式强化学习改进奖励导向行为中的轨迹预测。

Int J Neural Syst. 2022 Sep;32(9):2250038. doi: 10.1142/S0129065722500381. Epub 2022 Aug 19.

Testing models of context-dependent outcome encoding in reinforcement learning.测试强化学习中依赖上下文的结果编码模型。

Cognition. 2023 Jan;230:105280. doi: 10.1016/j.cognition.2022.105280. Epub 2022 Sep 12.

Where Does Value Come From?价值从何而来？

Trends Cogn Sci. 2019 Oct;23(10):836-850. doi: 10.1016/j.tics.2019.07.012. Epub 2019 Sep 4.

Asymmetric and adaptive reward coding via normalized reinforcement learning.通过归一化强化学习进行非对称和自适应奖励编码。

PLoS Comput Biol. 2022 Jul 21;18(7):e1010350. doi: 10.1371/journal.pcbi.1010350. eCollection 2022 Jul.

A neural network model with dopamine-like reinforcement signal that learns a spatial delayed response task.一种具有类似多巴胺强化信号的神经网络模型，用于学习空间延迟反应任务。

Neuroscience. 1999;91(3):871-90. doi: 10.1016/s0306-4522(98)00697-6.

A universal role of the ventral striatum in reward-based learning: evidence from human studies.腹侧纹状体在基于奖励的学习中的普遍作用：来自人体研究的证据。

Neurobiol Learn Mem. 2014 Oct;114:90-100. doi: 10.1016/j.nlm.2014.05.002. Epub 2014 May 10.

Training an Actor-Critic Reinforcement Learning Controller for Arm Movement Using Human-Generated Rewards.使用人类生成的奖励训练用于手臂运动的 Actor-Critic 强化学习控制器。

IEEE Trans Neural Syst Rehabil Eng. 2017 Oct;25(10):1892-1905. doi: 10.1109/TNSRE.2017.2700395. Epub 2017 May 2.

Forgetting in Reinforcement Learning Links Sustained Dopamine Signals to Motivation.强化学习中的遗忘将持续的多巴胺信号与动机联系起来。

PLoS Comput Biol. 2016 Oct 13;12(10):e1005145. doi: 10.1371/journal.pcbi.1005145. eCollection 2016 Oct.

引用本文的文献

Relative Value Encoding in Large Language Models: A Multi-Task, Multi-Model Investigation.大语言模型中的相对价值编码：多任务、多模型研究

Open Mind (Camb). 2025 May 9;9:709-725. doi: 10.1162/opmi_a_00209. eCollection 2025.

Distributional dual-process model predicts strategic shifts in decision-making under uncertainty.分布双过程模型预测了不确定性下决策中的策略转变。

Commun Psychol. 2025 Apr 14;3(1):61. doi: 10.1038/s44271-025-00249-y.

Unraveling the Intricacies of Curiosity: A Comprehensive Study of Its Measures in the Chinese Context.剖析好奇心的复杂性：中国背景下对其测量方法的全面研究。

Psych J. 2025 Apr;14(2):219-234. doi: 10.1002/pchj.813. Epub 2024 Nov 20.

Fundamental processes in sensorimotor learning: Reasoning, refinement, and retrieval.感觉运动学习的基本过程：推理、优化与检索。

Elife. 2024 Aug 1;13:e91839. doi: 10.7554/eLife.91839.

Frequent winners explain apparent skewness preferences in experience-based decisions.频繁的赢家解释了基于经验的决策中明显的偏斜偏好。

Proc Natl Acad Sci U S A. 2024 Mar 19;121(12):e2317751121. doi: 10.1073/pnas.2317751121. Epub 2024 Mar 15.

Goal-directed learning in adolescence: neurocognitive development and contextual influences.青少年的目标导向学习：神经认知发展与环境影响。

Nat Rev Neurosci. 2024 Mar;25(3):176-194. doi: 10.1038/s41583-023-00783-w. Epub 2024 Jan 23.

A novel technique for delineating the effect of variation in the learning rate on the neural correlates of reward prediction errors in model-based fMRI.一种用于在基于模型的功能磁共振成像中描绘学习率变化对奖励预测误差神经关联影响的新技术。

Front Psychol. 2023 Dec 21;14:1211528. doi: 10.3389/fpsyg.2023.1211528. eCollection 2023.

Naturalistic reinforcement learning.自然强化学习。

Trends Cogn Sci. 2024 Feb;28(2):144-158. doi: 10.1016/j.tics.2023.08.016. Epub 2023 Sep 29.

本文引用的文献

The functional form of value normalization in human reinforcement learning.人类强化学习中的价值归一化的函数形式。

Elife. 2023 Jul 10;12:e83891. doi: 10.7554/eLife.83891.

Testing models of context-dependent outcome encoding in reinforcement learning.测试强化学习中依赖上下文的结果编码模型。

Cognition. 2023 Jan;230:105280. doi: 10.1016/j.cognition.2022.105280. Epub 2022 Sep 12.

Reinforcement learning in and out of context: The effects of attentional focus.强化学习的内外情境：注意力焦点的影响。

J Exp Psychol Learn Mem Cogn. 2023 Aug;49(8):1193-1217. doi: 10.1037/xlm0001145. Epub 2022 Jul 4.

Human value learning and representation reflect rational adaptation to task demands.人类价值学习和表示反映了对任务需求的理性适应。

Nat Hum Behav. 2022 Sep;6(9):1268-1279. doi: 10.1038/s41562-022-01360-4. Epub 2022 May 30.

The Role of Executive Function in Shaping Reinforcement Learning.执行功能在塑造强化学习中的作用。

Curr Opin Behav Sci. 2021 Apr;38:66-73. doi: 10.1016/j.cobeha.2020.10.003. Epub 2020 Nov 14.

Extrinsic rewards, intrinsic rewards, and non-optimal behavior.外部奖励、内部奖励与非最优行为。

J Comput Neurosci. 2022 May;50(2):139-143. doi: 10.1007/s10827-022-00813-z. Epub 2022 Feb 5.

Asymmetric reinforcement learning facilitates human inference of transitive relations.非对称强化学习有助于人类推断传递关系。

Nat Hum Behav. 2022 Apr;6(4):555-564. doi: 10.1038/s41562-021-01263-w. Epub 2022 Jan 31.

Filling the gaps: Cognitive control as a critical lens for understanding mechanisms of value-based decision-making.填补空白：认知控制作为理解基于价值的决策机制的关键视角。

Neurosci Biobehav Rev. 2022 Mar;134:104483. doi: 10.1016/j.neubiorev.2021.12.006. Epub 2021 Dec 10.

A Neurocomputational Model for Intrinsic Reward.一个内在奖励的神经计算模型。

J Neurosci. 2021 Oct 27;41(43):8963-8971. doi: 10.1523/JNEUROSCI.0858-20.2021. Epub 2021 Sep 20.

Executive Function Assigns Value to Novel Goal-Congruent Outcomes.执行功能赋予新的与目标一致的结果价值。

Cereb Cortex. 2021 Nov 23;32(1):231-247. doi: 10.1093/cercor/bhab205.

文献检索

告别复杂PubMed语法，用中文像聊天一样搜索，搜遍4000万医学文献。AI智能推荐，让科研检索更轻松。

立即免费搜索

文件翻译

保留排版，准确专业，支持PDF/Word/PPT等文件格式，支持 12+语言互译。

免费翻译文档

深度研究

AI帮你快速写综述，25分钟生成高质量综述，智能提取关键信息，辅助科研写作。

立即免费体验

内在奖励解释了强化学习中的情境敏感估值。

Intrinsic rewards explain context-sensitive valuation in reinforcement learning.

机构信息

出版信息

相似文献

引用本文的文献

本文引用的文献

文献检索

文件翻译

深度研究

Suppr 超能文献

相似文献

引用本文的文献

本文引用的文献