• 文献检索
  • 文档翻译
  • 深度研究
  • 学术资讯
  • Suppr Zotero 插件Zotero 插件
  • 邀请有礼
  • 套餐&价格
  • 历史记录
应用&插件
Suppr Zotero 插件Zotero 插件浏览器插件Mac 客户端Windows 客户端微信小程序
定价
高级版会员购买积分包购买API积分包
服务
文献检索文档翻译深度研究API 文档MCP 服务
关于我们
关于 Suppr公司介绍联系我们用户协议隐私条款
关注我们

Suppr 超能文献

核心技术专利:CN118964589B侵权必究
粤ICP备2023148730 号-1Suppr @ 2026

文献检索

告别复杂PubMed语法,用中文像聊天一样搜索,搜遍4000万医学文献。AI智能推荐,让科研检索更轻松。

立即免费搜索

文件翻译

保留排版,准确专业,支持PDF/Word/PPT等文件格式,支持 12+语言互译。

免费翻译文档

深度研究

AI帮你快速写综述,25分钟生成高质量综述,智能提取关键信息,辅助科研写作。

立即免费体验

同一枚硬币的两面:人类强化学习中范围适应的有益和有害后果。

Two sides of the same coin: Beneficial and detrimental consequences of range adaptation in human reinforcement learning.

作者信息

Bavard Sophie, Rustichini Aldo, Palminteri Stefano

机构信息

Laboratoire de Neurosciences Cognitives et Computationnelles, Institut National de la Santé et Recherche Médicale, 29 rue d'Ulm, 75005 Paris, France.

Ecole normale supérieure, 29 rue d'Ulm, 75005 Paris, France.

出版信息

Sci Adv. 2021 Apr 2;7(14). doi: 10.1126/sciadv.abe0340. Print 2021 Apr.

DOI:10.1126/sciadv.abe0340
PMID:33811071
原文链接:https://pmc.ncbi.nlm.nih.gov/articles/PMC11060039/
Abstract

Evidence suggests that economic values are rescaled as a function of the range of the available options. Although locally adaptive, range adaptation has been shown to lead to suboptimal choices, particularly notable in reinforcement learning (RL) situations when options are extrapolated from their original context to a new one. Range adaptation can be seen as the result of an adaptive coding process aiming at increasing the signal-to-noise ratio. However, this hypothesis leads to a counterintuitive prediction: Decreasing task difficulty should increase range adaptation and, consequently, extrapolation errors. Here, we tested the paradoxical relation between range adaptation and performance in a large sample of participants performing variants of an RL task, where we manipulated task difficulty. Results confirmed that range adaptation induces systematic extrapolation errors and is stronger when decreasing task difficulty. Last, we propose a range-adapting model and show that it is able to parsimoniously capture all the behavioral results.

摘要

有证据表明,经济价值会根据可用选项的范围进行重新缩放。尽管范围适应具有局部适应性,但已表明它会导致次优选择,在强化学习(RL)情境中尤其明显,即当选项从其原始情境外推到新情境时。范围适应可被视为旨在提高信噪比的自适应编码过程的结果。然而,这一假设导致了一个违反直觉的预测:降低任务难度应该会增加范围适应,进而增加外推误差。在此,我们在执行RL任务变体的大量参与者样本中测试了范围适应与表现之间的矛盾关系,我们对任务难度进行了操纵。结果证实,范围适应会引发系统性外推误差,且在降低任务难度时更强。最后,我们提出了一个范围适应模型,并表明它能够简洁地捕捉所有行为结果。

https://cdn.ncbi.nlm.nih.gov/pmc/blobs/72d3/11060039/ef4ce8ffb6e1/abe0340-f6.jpg
https://cdn.ncbi.nlm.nih.gov/pmc/blobs/72d3/11060039/ab74b6fadcbc/abe0340-f1.jpg
https://cdn.ncbi.nlm.nih.gov/pmc/blobs/72d3/11060039/e137eafd75ed/abe0340-f2.jpg
https://cdn.ncbi.nlm.nih.gov/pmc/blobs/72d3/11060039/8f3d804beaa6/abe0340-f3.jpg
https://cdn.ncbi.nlm.nih.gov/pmc/blobs/72d3/11060039/a577fb0cb3b0/abe0340-f4.jpg
https://cdn.ncbi.nlm.nih.gov/pmc/blobs/72d3/11060039/a79b6d4feef9/abe0340-f5.jpg
https://cdn.ncbi.nlm.nih.gov/pmc/blobs/72d3/11060039/ef4ce8ffb6e1/abe0340-f6.jpg
https://cdn.ncbi.nlm.nih.gov/pmc/blobs/72d3/11060039/ab74b6fadcbc/abe0340-f1.jpg
https://cdn.ncbi.nlm.nih.gov/pmc/blobs/72d3/11060039/e137eafd75ed/abe0340-f2.jpg
https://cdn.ncbi.nlm.nih.gov/pmc/blobs/72d3/11060039/8f3d804beaa6/abe0340-f3.jpg
https://cdn.ncbi.nlm.nih.gov/pmc/blobs/72d3/11060039/a577fb0cb3b0/abe0340-f4.jpg
https://cdn.ncbi.nlm.nih.gov/pmc/blobs/72d3/11060039/a79b6d4feef9/abe0340-f5.jpg
https://cdn.ncbi.nlm.nih.gov/pmc/blobs/72d3/11060039/ef4ce8ffb6e1/abe0340-f6.jpg

相似文献

1
Two sides of the same coin: Beneficial and detrimental consequences of range adaptation in human reinforcement learning.同一枚硬币的两面:人类强化学习中范围适应的有益和有害后果。
Sci Adv. 2021 Apr 2;7(14). doi: 10.1126/sciadv.abe0340. Print 2021 Apr.
2
Medial prefrontal cortex and the adaptive regulation of reinforcement learning parameters.内侧前额叶皮质与强化学习参数的适应性调节。
Prog Brain Res. 2013;202:441-64. doi: 10.1016/B978-0-444-62604-2.00022-8.
3
Reference-point centering and range-adaptation enhance human reinforcement learning at the cost of irrational preferences.参考点中心化和范围适应以牺牲不合理偏好为代价增强了人类的强化学习。
Nat Commun. 2018 Oct 29;9(1):4503. doi: 10.1038/s41467-018-06781-2.
4
Reinforcement learning in and out of context: The effects of attentional focus.强化学习的内外情境:注意力焦点的影响。
J Exp Psychol Learn Mem Cogn. 2023 Aug;49(8):1193-1217. doi: 10.1037/xlm0001145. Epub 2022 Jul 4.
5
Intrinsic rewards explain context-sensitive valuation in reinforcement learning.内在奖励解释了强化学习中的情境敏感估值。
PLoS Biol. 2023 Jul 17;21(7):e3002201. doi: 10.1371/journal.pbio.3002201. eCollection 2023 Jul.
6
Neural Index of Reinforcement Learning Predicts Improved Stimulus-Response Retention under High Working Memory Load.神经强化学习指数预测在高工作记忆负荷下改善刺激-反应保持。
J Neurosci. 2023 Apr 26;43(17):3131-3143. doi: 10.1523/JNEUROSCI.1274-22.2023. Epub 2023 Mar 17.
7
Adaptive Prediction Error Coding in the Human Midbrain and Striatum Facilitates Behavioral Adaptation and Learning Efficiency.人类中脑和纹状体中的适应性预测误差编码促进行为适应和学习效率。
Neuron. 2016 Jun 1;90(5):1127-38. doi: 10.1016/j.neuron.2016.04.019. Epub 2016 May 12.
8
Adaptive coordination of working-memory and reinforcement learning in non-human primates performing a trial-and-error problem solving task.在执行试错问题解决任务的非人类灵长类动物中工作记忆与强化学习的适应性协调
Behav Brain Res. 2018 Dec 14;355:76-89. doi: 10.1016/j.bbr.2017.09.030. Epub 2017 Oct 20.
9
How much of reinforcement learning is working memory, not reinforcement learning? A behavioral, computational, and neurogenetic analysis.强化学习中有多少是工作记忆,而不是强化学习?一项行为、计算和神经遗传学分析。
Eur J Neurosci. 2012 Apr;35(7):1024-35. doi: 10.1111/j.1460-9568.2011.07980.x.
10
Nutrient-Sensitive Reinforcement Learning in Monkeys.猴子的营养敏感强化学习。
J Neurosci. 2023 Mar 8;43(10):1714-1730. doi: 10.1523/JNEUROSCI.0752-22.2022. Epub 2023 Jan 20.

引用本文的文献

1
Social inequity disrupts reward-based learning.社会不平等会扰乱基于奖励的学习。
Commun Psychol. 2025 Aug 16;3(1):125. doi: 10.1038/s44271-025-00300-y.
2
Estimation-uncertainty affects decisions with and without learning opportunities.估计不确定性会影响有无学习机会情况下的决策。
Nat Commun. 2025 Jul 21;16(1):6706. doi: 10.1038/s41467-025-61960-2.
3
Relative Value Encoding in Large Language Models: A Multi-Task, Multi-Model Investigation.大语言模型中的相对价值编码:多任务、多模型研究

本文引用的文献

1
A map of decoy influence in human multialternative choice.人类多选项选择中诱饵影响的图谱。
Proc Natl Acad Sci U S A. 2020 Oct 6;117(40):25169-25178. doi: 10.1073/pnas.2005058117. Epub 2020 Sep 21.
2
The Effect of Counterfactual Information on Outcome Value Coding in Medial Prefrontal and Cingulate Cortex: From an Absolute to a Relative Neural Code.反事实信息对内侧前额叶和扣带回皮层结果价值编码的影响:从绝对神经编码到相对神经编码。
J Neurosci. 2020 Apr 15;40(16):3268-3277. doi: 10.1523/JNEUROSCI.1712-19.2020. Epub 2020 Mar 10.
3
Value-based attention but not divisive normalization influences decisions with multiple alternatives.
Open Mind (Camb). 2025 May 9;9:709-725. doi: 10.1162/opmi_a_00209. eCollection 2025.
4
The timescale and direction of influence of a third inferior alternative in human value-learning.人类价值学习中第三种次优选择的影响时间尺度和方向。
Commun Psychol. 2025 Apr 5;3(1):56. doi: 10.1038/s44271-025-00229-2.
5
Comparing experience- and description-based economic preferences across 11 countries.比较 11 个国家基于经验和描述的经济偏好。
Nat Hum Behav. 2024 Aug;8(8):1554-1567. doi: 10.1038/s41562-024-01894-9. Epub 2024 Jun 14.
6
Foraging in a non-foraging task: Fitness maximization explains human risk preference dynamics under changing environment.在非觅食任务中觅食:适应环境变化的人类风险偏好动态的最大适应度解释。
PLoS Comput Biol. 2024 May 13;20(5):e1012080. doi: 10.1371/journal.pcbi.1012080. eCollection 2024 May.
7
Recent Opioid Use Impedes Range Adaptation in Reinforcement Learning in Human Addiction.近期阿片类药物的使用会阻碍人类成瘾中强化学习的适应范围。
Biol Psychiatry. 2024 May 15;95(10):974-984. doi: 10.1016/j.biopsych.2023.12.005. Epub 2023 Dec 13.
8
Intrinsic rewards explain context-sensitive valuation in reinforcement learning.内在奖励解释了强化学习中的情境敏感估值。
PLoS Biol. 2023 Jul 17;21(7):e3002201. doi: 10.1371/journal.pbio.3002201. eCollection 2023 Jul.
9
The functional form of value normalization in human reinforcement learning.人类强化学习中的价值归一化的函数形式。
Elife. 2023 Jul 10;12:e83891. doi: 10.7554/eLife.83891.
10
The Future of Decisions From Experience: Connecting Real-World Decision Problems to Cognitive Processes.经验决策的未来:将现实世界的决策问题与认知过程联系起来。
Perspect Psychol Sci. 2024 Jan;19(1):82-102. doi: 10.1177/17456916231179138. Epub 2023 Jun 30.
基于价值的注意力而非分歧归一化影响具有多个备选方案的决策。
Nat Hum Behav. 2020 Jun;4(6):634-645. doi: 10.1038/s41562-020-0822-0. Epub 2020 Feb 3.
4
Decomposing the effects of context valence and feedback information on speed and accuracy during reinforcement learning: a meta-analytical approach using diffusion decision modeling.使用扩散决策模型对强化学习过程中上下文效价和反馈信息对速度和准确性的影响进行分解:一项元分析方法。
Cogn Affect Behav Neurosci. 2019 Jun;19(3):490-502. doi: 10.3758/s13415-019-00723-1.
5
Contextual influence on confidence judgments in human reinforcement learning.语境对人类强化学习中置信判断的影响。
PLoS Comput Biol. 2019 Apr 8;15(4):e1006973. doi: 10.1371/journal.pcbi.1006973. eCollection 2019 Apr.
6
Partial Adaptation to the Value Range in the Macaque Orbitofrontal Cortex.猴眶额皮质的价值范围的部分适应。
J Neurosci. 2019 May 1;39(18):3498-3513. doi: 10.1523/JNEUROSCI.2279-18.2019. Epub 2019 Mar 4.
7
Habits without values.无价值观的习惯。
Psychol Rev. 2019 Mar;126(2):292-311. doi: 10.1037/rev0000120. Epub 2019 Jan 24.
8
Stimulus control of actions and habits: A role for reinforcer predictability and attention in the development of habitual behavior.动作与习惯的刺激控制:强化物可预测性和注意力在习惯行为发展中的作用。
J Exp Psychol Anim Learn Cogn. 2018 Oct;44(4):370-384. doi: 10.1037/xan0000188.
9
Reference-point centering and range-adaptation enhance human reinforcement learning at the cost of irrational preferences.参考点中心化和范围适应以牺牲不合理偏好为代价增强了人类的强化学习。
Nat Commun. 2018 Oct 29;9(1):4503. doi: 10.1038/s41467-018-06781-2.
10
Optimal coding and neuronal adaptation in economic decisions.经济决策中的最优编码和神经元适应。
Nat Commun. 2017 Oct 31;8(1):1208. doi: 10.1038/s41467-017-01373-y.