参考点中心化和范围适应以牺牲不合理偏好为代价增强了人类的强化学习。

Reference-point centering and range-adaptation enhance human reinforcement learning at the cost of irrational preferences.

机构信息

Laboratoire de Neurosciences Cognitives Computationnelles, Institut National de la Santé et Recherche Médicale, 29 rue d'Ulm, 75005, Paris, France.

Département d'Etudes Cognitives, Ecole Normale Supérieure, Paris, 75005, France.

出版信息

Nat Commun. 2018 Oct 29;9(1):4503. doi: 10.1038/s41467-018-06781-2.

DOI:10.1038/s41467-018-06781-2

PMID:30374019

原文链接:https://pmc.ncbi.nlm.nih.gov/articles/PMC6206161/

Abstract

In economics and perceptual decision-making contextual effects are well documented, where decision weights are adjusted as a function of the distribution of stimuli. Yet, in reinforcement learning literature whether and how contextual information pertaining to decision states is integrated in learning algorithms has received comparably little attention. Here, we investigate reinforcement learning behavior and its computational substrates in a task where we orthogonally manipulate outcome valence and magnitude, resulting in systematic variations in state-values. Model comparison indicates that subjects' behavior is best accounted for by an algorithm which includes both reference point-dependence and range-adaptation-two crucial features of state-dependent valuation. In addition, we find that state-dependent outcome valuation progressively emerges, is favored by increasing outcome information and correlated with explicit understanding of the task structure. Finally, our data clearly show that, while being locally adaptive (for instance in negative valence and small magnitude contexts), state-dependent valuation comes at the cost of seemingly irrational choices, when options are extrapolated out from their original contexts.

摘要

在经济学和感性决策中，语境效应是有充分记录的，其中决策权重会根据刺激的分布进行调整。然而，在强化学习文献中，与决策状态相关的语境信息是否以及如何被整合到学习算法中，受到的关注相对较少。在这里，我们在一个任务中研究强化学习行为及其计算基础，在这个任务中，我们正交地操纵结果的效价和大小，从而导致状态值的系统变化。模型比较表明，受试者的行为最好由一个包含参考点依赖性和范围适应的算法来解释，这是状态相关估值的两个关键特征。此外，我们发现状态相关的结果估值逐渐出现，受到增加的结果信息的青睐，并与对任务结构的明确理解相关。最后，我们的数据清楚地表明，虽然状态相关的估值具有局部适应性（例如在负效价和小幅度的情况下），但当选项从其原始环境中推断出来时，它会以看似不合理的选择为代价。

https://cdn.ncbi.nlm.nih.gov/pmc/blobs/92c3/6206161/5429ec5c99c0/41467_2018_6781_Fig1_HTML.jpg

相似文献

Reference-point centering and range-adaptation enhance human reinforcement learning at the cost of irrational preferences.

Nat Commun. 2018 Oct 29;9(1):4503. doi: 10.1038/s41467-018-06781-2.

The actor-critic learning is behind the matching law: matching versus optimal behaviors.

Neural Comput. 2008 Jan;20(1):227-51. doi: 10.1162/neco.2008.20.1.227.

Opponent Identity Influences Value Learning in Simple Games.

J Neurosci. 2015 Aug 5;35(31):11133-43. doi: 10.1523/JNEUROSCI.3530-14.2015.

Reinforcement learning signals in the human striatum distinguish learners from nonlearners during reward-based decision making.

J Neurosci. 2007 Nov 21;27(47):12860-7. doi: 10.1523/JNEUROSCI.2496-07.2007.

Computational noise in reward-guided learning drives behavioral variability in volatile environments.

Nat Neurosci. 2019 Dec;22(12):2066-2077. doi: 10.1038/s41593-019-0518-9. Epub 2019 Oct 28.

Reward and avoidance learning in the context of aversive environments and possible implications for depressive symptoms.

Psychopharmacology (Berl). 2019 Aug;236(8):2437-2449. doi: 10.1007/s00213-019-05299-9. Epub 2019 Jun 28.

Neural basis of reinforcement learning and decision making.

Annu Rev Neurosci. 2012;35:287-308. doi: 10.1146/annurev-neuro-062111-150512. Epub 2012 Mar 29.

Generalization of value in reinforcement learning by humans.

Eur J Neurosci. 2012 Apr;35(7):1092-104. doi: 10.1111/j.1460-9568.2012.08017.x.

How the Level of Reward Awareness Changes the Computational and Electrophysiological Signatures of Reinforcement Learning.

J Neurosci. 2018 Nov 28;38(48):10338-10348. doi: 10.1523/JNEUROSCI.0457-18.2018. Epub 2018 Oct 16.

Novelty is not surprise: Human exploratory and adaptive behavior in sequential decision-making.

PLoS Comput Biol. 2021 Jun 3;17(6):e1009070. doi: 10.1371/journal.pcbi.1009070. eCollection 2021 Jun.

引用本文的文献

Investigating Learning, Decision-Making, and Mental Health in Pregnancy: Insights From a UK Cohort Study.

Comput Psychiatr. 2025 Sep 3;9(1):142-158. doi: 10.5334/cpsy.134. eCollection 2025.

Social inequity disrupts reward-based learning.

Commun Psychol. 2025 Aug 16;3(1):125. doi: 10.1038/s44271-025-00300-y.

Relative Value Encoding in Large Language Models: A Multi-Task, Multi-Model Investigation.

Open Mind (Camb). 2025 May 9;9:709-725. doi: 10.1162/opmi_a_00209. eCollection 2025.

Distributional dual-process model predicts strategic shifts in decision-making under uncertainty.

Commun Psychol. 2025 Apr 14;3(1):61. doi: 10.1038/s44271-025-00249-y.

Comparing experience- and description-based economic preferences across 11 countries.

Nat Hum Behav. 2024 Aug;8(8):1554-1567. doi: 10.1038/s41562-024-01894-9. Epub 2024 Jun 14.

Foraging in a non-foraging task: Fitness maximization explains human risk preference dynamics under changing environment.

PLoS Comput Biol. 2024 May 13;20(5):e1012080. doi: 10.1371/journal.pcbi.1012080. eCollection 2024 May.

Connected in Bad Times and in Good Times: Empathy Induces Stable Social Closeness.

J Neurosci. 2024 Jun 5;44(23):e1108232024. doi: 10.1523/JNEUROSCI.1108-23.2024.

Recent Opioid Use Impedes Range Adaptation in Reinforcement Learning in Human Addiction.

Biol Psychiatry. 2024 May 15;95(10):974-984. doi: 10.1016/j.biopsych.2023.12.005. Epub 2023 Dec 13.

Neural and computational underpinnings of biased confidence in human reinforcement learning.

Nat Commun. 2023 Oct 28;14(1):6896. doi: 10.1038/s41467-023-42589-5.

Intrinsic rewards explain context-sensitive valuation in reinforcement learning.

PLoS Biol. 2023 Jul 17;21(7):e3002201. doi: 10.1371/journal.pbio.3002201. eCollection 2023 Jul.

本文引用的文献

The risk elicitation puzzle.

Nat Hum Behav. 2017 Nov;1(11):803-809. doi: 10.1038/s41562-017-0219-x. Epub 2017 Oct 2.

Free choice shapes normalized value signals in medial orbitofrontal cortex.

Nat Commun. 2018 Jan 11;9(1):162. doi: 10.1038/s41467-017-02614-w.

Optimal coding and neuronal adaptation in economic decisions.

Nat Commun. 2017 Oct 31;8(1):1208. doi: 10.1038/s41467-017-01373-y.

Learning relative values in the striatum induces violations of normative decision making.

Nat Commun. 2017 Jun 20;8:16033. doi: 10.1038/ncomms16033.

The Importance of Falsification in Computational Cognitive Modeling.

Trends Cogn Sci. 2017 Jun;21(6):425-433. doi: 10.1016/j.tics.2017.03.011. Epub 2017 May 2.

Adaptive Value Normalization in the Prefrontal Cortex Is Reduced by Memory Load.

eNeuro. 2017 Apr 27;4(2). doi: 10.1523/ENEURO.0365-17.2017. eCollection 2017 Mar-Apr.

Partial Adaptation of Obtained and Observed Value Signals Preserves Information about Gains and Losses.

J Neurosci. 2016 Sep 28;36(39):10016-25. doi: 10.1523/JNEUROSCI.0487-16.2016.

Neural processes mediating contextual influences on human choice behaviour.

Nat Commun. 2016 Aug 18;7:12416. doi: 10.1038/ncomms12416.

The Computational Development of Reinforcement Learning during Adolescence.

PLoS Comput Biol. 2016 Jun 20;12(6):e1004953. doi: 10.1371/journal.pcbi.1004953. eCollection 2016 Jun.

Contextual modulation of value signals in reward and punishment learning.

Nat Commun. 2015 Aug 25;6:8096. doi: 10.1038/ncomms9096.

文献AI研究员

20分钟写一篇综述，助力文献阅读效率提升50倍。

立即体验

用中文搜PubMed

大模型驱动的PubMed中文搜索引擎

马上搜索

文档翻译

学术文献翻译模型，支持多种主流文档格式。

立即体验

参考点中心化和范围适应以牺牲不合理偏好为代价增强了人类的强化学习。

Reference-point centering and range-adaptation enhance human reinforcement learning at the cost of irrational preferences.

机构信息

Laboratoire de Neurosciences Cognitives Computationnelles, Institut National de la Santé et Recherche Médicale, 29 rue d'Ulm, 75005, Paris, France.

Département d'Etudes Cognitives, Ecole Normale Supérieure, Paris, 75005, France.

出版信息

Nat Commun. 2018 Oct 29;9(1):4503. doi: 10.1038/s41467-018-06781-2.

DOI:10.1038/s41467-018-06781-2

PMID:30374019

原文链接:https://pmc.ncbi.nlm.nih.gov/articles/PMC6206161/

Abstract

摘要

参考点中心化和范围适应以牺牲不合理偏好为代价增强了人类的强化学习。

Reference-point centering and range-adaptation enhance human reinforcement learning at the cost of irrational preferences.

机构信息

出版信息

相似文献

引用本文的文献

本文引用的文献

文献AI研究员

用中文搜PubMed

文档翻译

Suppr 超能文献

参考点中心化和范围适应以牺牲不合理偏好为代价增强了人类的强化学习。

Reference-point centering and range-adaptation enhance human reinforcement learning at the cost of irrational preferences.

机构信息

出版信息

相似文献

引用本文的文献

本文引用的文献