• 文献检索
  • 文档翻译
  • 深度研究
  • 学术资讯
  • Suppr Zotero 插件Zotero 插件
  • 邀请有礼
  • 套餐&价格
  • 历史记录
应用&插件
Suppr Zotero 插件Zotero 插件浏览器插件Mac 客户端Windows 客户端微信小程序
定价
高级版会员购买积分包购买API积分包
服务
文献检索文档翻译深度研究API 文档MCP 服务
关于我们
关于 Suppr公司介绍联系我们用户协议隐私条款
关注我们

Suppr 超能文献

核心技术专利:CN118964589B侵权必究
粤ICP备2023148730 号-1Suppr @ 2026

文献检索

告别复杂PubMed语法,用中文像聊天一样搜索,搜遍4000万医学文献。AI智能推荐,让科研检索更轻松。

立即免费搜索

文件翻译

保留排版,准确专业,支持PDF/Word/PPT等文件格式,支持 12+语言互译。

免费翻译文档

深度研究

AI帮你快速写综述,25分钟生成高质量综述,智能提取关键信息,辅助科研写作。

立即免费体验

测试强化学习中依赖上下文的结果编码模型。

Testing models of context-dependent outcome encoding in reinforcement learning.

机构信息

Department of Psychology, University of South Carolina, USA.

Department of Psychology, University of South Carolina, USA.

出版信息

Cognition. 2023 Jan;230:105280. doi: 10.1016/j.cognition.2022.105280. Epub 2022 Sep 12.

DOI:10.1016/j.cognition.2022.105280
PMID:36099856
Abstract

Previous studies of reinforcement learning (RL) have established that choice outcomes are encoded in a context-dependent fashion. Several computational models have been proposed to explain context-dependent encoding, including reference point centering and range adaptation models. The former assumes that outcomes are centered around a running estimate of the average reward in each choice context, while the latter assumes that outcomes are compared to the minimum reward and then scaled by an estimate of the range of outcomes in each choice context. However, there are other computational mechanisms that can explain context dependence in RL. In the present study, a frequency encoding model is introduced that assumes outcomes are evaluated based on their proportional rank within a sample of recently experienced outcomes from the local context. A range-frequency model is also considered that combines the range adaptation and frequency encoding mechanisms. We conducted two fully incentivized behavioral experiments using choice tasks for which the candidate models make divergent predictions. The results were most consistent with models that incorporate frequency or rank-based encoding. The findings from these experiments deepen our understanding of the underlying computational processes mediating context-dependent outcome encoding in human RL.

摘要

先前关于强化学习(RL)的研究已经证实,选择结果是以依赖上下文的方式进行编码的。已经提出了几种计算模型来解释上下文相关的编码,包括参考点中心化和范围适应模型。前者假设结果以每个选择上下文的平均奖励的运行估计为中心,而后者假设结果与最低奖励进行比较,然后根据每个选择上下文的结果范围的估计进行缩放。然而,还有其他计算机制可以解释 RL 中的上下文依赖性。在本研究中,引入了一种频率编码模型,该模型假设结果是基于其在最近从局部上下文经历的样本中进行的比例排名进行评估的。还考虑了一种范围频率模型,该模型结合了范围适应和频率编码机制。我们进行了两项完全激励的行为实验,使用候选模型做出不同预测的选择任务。结果与纳入频率或基于等级的编码的模型最一致。这些实验的结果加深了我们对人类 RL 中介导上下文相关结果编码的基本计算过程的理解。

相似文献

1
Testing models of context-dependent outcome encoding in reinforcement learning.测试强化学习中依赖上下文的结果编码模型。
Cognition. 2023 Jan;230:105280. doi: 10.1016/j.cognition.2022.105280. Epub 2022 Sep 12.
2
Intrinsic rewards explain context-sensitive valuation in reinforcement learning.内在奖励解释了强化学习中的情境敏感估值。
PLoS Biol. 2023 Jul 17;21(7):e3002201. doi: 10.1371/journal.pbio.3002201. eCollection 2023 Jul.
3
Effects of blocked versus interleaved training on relative value learning.封闭训练与交错训练对相对价值学习的影响。
Psychon Bull Rev. 2023 Oct;30(5):1895-1907. doi: 10.3758/s13423-023-02290-6. Epub 2023 Apr 18.
4
Recent Opioid Use Impedes Range Adaptation in Reinforcement Learning in Human Addiction.近期阿片类药物的使用会阻碍人类成瘾中强化学习的适应范围。
Biol Psychiatry. 2024 May 15;95(10):974-984. doi: 10.1016/j.biopsych.2023.12.005. Epub 2023 Dec 13.
5
Multiple memory systems as substrates for multiple decision systems.多种记忆系统作为多种决策系统的基础。
Neurobiol Learn Mem. 2015 Jan;117:4-13. doi: 10.1016/j.nlm.2014.04.014. Epub 2014 May 15.
6
The functional form of value normalization in human reinforcement learning.人类强化学习中的价值归一化的函数形式。
Elife. 2023 Jul 10;12:e83891. doi: 10.7554/eLife.83891.
7
Asymmetric and adaptive reward coding via normalized reinforcement learning.通过归一化强化学习进行非对称和自适应奖励编码。
PLoS Comput Biol. 2022 Jul 21;18(7):e1010350. doi: 10.1371/journal.pcbi.1010350. eCollection 2022 Jul.
8
The Effect of Counterfactual Information on Outcome Value Coding in Medial Prefrontal and Cingulate Cortex: From an Absolute to a Relative Neural Code.反事实信息对内侧前额叶和扣带回皮层结果价值编码的影响:从绝对神经编码到相对神经编码。
J Neurosci. 2020 Apr 15;40(16):3268-3277. doi: 10.1523/JNEUROSCI.1712-19.2020. Epub 2020 Mar 10.
9
Heterogeneity of strategy use in the Iowa gambling task: a comparison of win-stay/lose-shift and reinforcement learning models.策略使用的异质性在爱荷华赌博任务中:赢留输变和强化学习模型的比较。
Psychon Bull Rev. 2013 Apr;20(2):364-71. doi: 10.3758/s13423-012-0324-9.
10
Entropy-based metrics for predicting choice behavior based on local response to reward.基于熵的指标,用于预测基于奖励局部响应的选择行为。
Nat Commun. 2021 Nov 12;12(1):6567. doi: 10.1038/s41467-021-26784-w.

引用本文的文献

1
Relative Value Encoding in Large Language Models: A Multi-Task, Multi-Model Investigation.大语言模型中的相对价值编码:多任务、多模型研究
Open Mind (Camb). 2025 May 9;9:709-725. doi: 10.1162/opmi_a_00209. eCollection 2025.
2
The timescale and direction of influence of a third inferior alternative in human value-learning.人类价值学习中第三种次优选择的影响时间尺度和方向。
Commun Psychol. 2025 Apr 5;3(1):56. doi: 10.1038/s44271-025-00229-2.
3
Comparing experience- and description-based economic preferences across 11 countries.比较 11 个国家基于经验和描述的经济偏好。
Nat Hum Behav. 2024 Aug;8(8):1554-1567. doi: 10.1038/s41562-024-01894-9. Epub 2024 Jun 14.
4
Frequent winners explain apparent skewness preferences in experience-based decisions.频繁的赢家解释了基于经验的决策中明显的偏斜偏好。
Proc Natl Acad Sci U S A. 2024 Mar 19;121(12):e2317751121. doi: 10.1073/pnas.2317751121. Epub 2024 Mar 15.
5
Recent Opioid Use Impedes Range Adaptation in Reinforcement Learning in Human Addiction.近期阿片类药物的使用会阻碍人类成瘾中强化学习的适应范围。
Biol Psychiatry. 2024 May 15;95(10):974-984. doi: 10.1016/j.biopsych.2023.12.005. Epub 2023 Dec 13.
6
Intrinsic rewards explain context-sensitive valuation in reinforcement learning.内在奖励解释了强化学习中的情境敏感估值。
PLoS Biol. 2023 Jul 17;21(7):e3002201. doi: 10.1371/journal.pbio.3002201. eCollection 2023 Jul.
7
The Future of Decisions From Experience: Connecting Real-World Decision Problems to Cognitive Processes.经验决策的未来:将现实世界的决策问题与认知过程联系起来。
Perspect Psychol Sci. 2024 Jan;19(1):82-102. doi: 10.1177/17456916231179138. Epub 2023 Jun 30.
8
Effects of blocked versus interleaved training on relative value learning.封闭训练与交错训练对相对价值学习的影响。
Psychon Bull Rev. 2023 Oct;30(5):1895-1907. doi: 10.3758/s13423-023-02290-6. Epub 2023 Apr 18.
9
Outcome context-dependence is not WEIRD: Comparing reinforcement- and description-based economic preferences worldwide.结果情境依赖性并非怪异现象:比较全球基于强化和描述的经济偏好。
Res Sq. 2023 Mar 2:rs.3.rs-2621222. doi: 10.21203/rs.3.rs-2621222/v1.