• 文献检索
  • 文档翻译
  • 深度研究
  • 学术资讯
  • Suppr Zotero 插件Zotero 插件
  • 邀请有礼
  • 套餐&价格
  • 历史记录
应用&插件
Suppr Zotero 插件Zotero 插件浏览器插件Mac 客户端Windows 客户端微信小程序
定价
高级版会员购买积分包购买API积分包
服务
文献检索文档翻译深度研究API 文档MCP 服务
关于我们
关于 Suppr公司介绍联系我们用户协议隐私条款
关注我们

Suppr 超能文献

核心技术专利:CN118964589B侵权必究
粤ICP备2023148730 号-1Suppr @ 2026

文献检索

告别复杂PubMed语法,用中文像聊天一样搜索,搜遍4000万医学文献。AI智能推荐,让科研检索更轻松。

立即免费搜索

文件翻译

保留排版,准确专业,支持PDF/Word/PPT等文件格式,支持 12+语言互译。

免费翻译文档

深度研究

AI帮你快速写综述,25分钟生成高质量综述,智能提取关键信息,辅助科研写作。

立即免费体验

追求幸福:习惯化和比较的强化学习视角。

The pursuit of happiness: A reinforcement learning perspective on habituation and comparisons.

机构信息

Department of Computer Science, Princeton University, Princeton, New Jersey, United States of America.

Department of Psychology, Princeton University, Princeton, New Jersey, United States of America.

出版信息

PLoS Comput Biol. 2022 Aug 4;18(8):e1010316. doi: 10.1371/journal.pcbi.1010316. eCollection 2022 Aug.

DOI:10.1371/journal.pcbi.1010316
PMID:35925875
原文链接:https://pmc.ncbi.nlm.nih.gov/articles/PMC9352009/
Abstract

In evaluating our choices, we often suffer from two tragic relativities. First, when our lives change for the better, we rapidly habituate to the higher standard of living. Second, we cannot escape comparing ourselves to various relative standards. Habituation and comparisons can be very disruptive to decision-making and happiness, and till date, it remains a puzzle why they have come to be a part of cognition in the first place. Here, we present computational evidence that suggests that these features might play an important role in promoting adaptive behavior. Using the framework of reinforcement learning, we explore the benefit of employing a reward function that, in addition to the reward provided by the underlying task, also depends on prior expectations and relative comparisons. We find that while agents equipped with this reward function are less happy, they learn faster and significantly outperform standard reward-based agents in a wide range of environments. Specifically, we find that relative comparisons speed up learning by providing an exploration incentive to the agents, and prior expectations serve as a useful aid to comparisons, especially in sparsely-rewarded and non-stationary environments. Our simulations also reveal potential drawbacks of this reward function and show that agents perform sub-optimally when comparisons are left unchecked and when there are too many similar options. Together, our results help explain why we are prone to becoming trapped in a cycle of never-ending wants and desires, and may shed light on psychopathologies such as depression, materialism, and overconsumption.

摘要

在评估我们的选择时,我们常常受到两种悲剧性的相对性的困扰。首先,当我们的生活变得更好时,我们会迅速习惯更高的生活水平。其次,我们无法避免将自己与各种相对标准进行比较。习惯和比较会对决策和幸福感产生很大的干扰,迄今为止,它们为什么首先成为认知的一部分仍然是一个谜。在这里,我们提出了计算证据,表明这些特征可能在促进适应性行为方面发挥重要作用。我们使用强化学习框架,探索了使用奖励函数的好处,该函数除了基础任务提供的奖励外,还取决于先前的期望和相对比较。我们发现,虽然配备这种奖励函数的代理不太快乐,但它们在广泛的环境中学习速度更快,并且明显优于基于标准奖励的代理。具体来说,我们发现相对比较通过为代理提供探索激励来加速学习,并且先前的期望可以作为比较的有用辅助,特别是在奖励稀疏和非平稳环境中。我们的模拟还揭示了这种奖励函数的潜在缺点,并表明当比较不受控制且存在太多相似选项时,代理的表现会不佳。总的来说,我们的研究结果有助于解释为什么我们容易陷入永无止境的欲望循环,并可能为抑郁症、唯物主义和过度消费等心理病理学提供一些启示。

https://cdn.ncbi.nlm.nih.gov/pmc/blobs/1795/9352009/cc70954f693f/pcbi.1010316.g005.jpg
https://cdn.ncbi.nlm.nih.gov/pmc/blobs/1795/9352009/d370966f5cf0/pcbi.1010316.g001.jpg
https://cdn.ncbi.nlm.nih.gov/pmc/blobs/1795/9352009/3f0cbc2205a6/pcbi.1010316.g002.jpg
https://cdn.ncbi.nlm.nih.gov/pmc/blobs/1795/9352009/d17c615f8fa2/pcbi.1010316.g003.jpg
https://cdn.ncbi.nlm.nih.gov/pmc/blobs/1795/9352009/a7d7d7102695/pcbi.1010316.g004.jpg
https://cdn.ncbi.nlm.nih.gov/pmc/blobs/1795/9352009/cc70954f693f/pcbi.1010316.g005.jpg
https://cdn.ncbi.nlm.nih.gov/pmc/blobs/1795/9352009/d370966f5cf0/pcbi.1010316.g001.jpg
https://cdn.ncbi.nlm.nih.gov/pmc/blobs/1795/9352009/3f0cbc2205a6/pcbi.1010316.g002.jpg
https://cdn.ncbi.nlm.nih.gov/pmc/blobs/1795/9352009/d17c615f8fa2/pcbi.1010316.g003.jpg
https://cdn.ncbi.nlm.nih.gov/pmc/blobs/1795/9352009/a7d7d7102695/pcbi.1010316.g004.jpg
https://cdn.ncbi.nlm.nih.gov/pmc/blobs/1795/9352009/cc70954f693f/pcbi.1010316.g005.jpg

相似文献

1
The pursuit of happiness: A reinforcement learning perspective on habituation and comparisons.追求幸福:习惯化和比较的强化学习视角。
PLoS Comput Biol. 2022 Aug 4;18(8):e1010316. doi: 10.1371/journal.pcbi.1010316. eCollection 2022 Aug.
2
Momentary subjective well-being depends on learning and not reward.瞬间主观幸福感取决于学习而非奖励。
Elife. 2020 Nov 17;9:e57977. doi: 10.7554/eLife.57977.
3
The functional form of value normalization in human reinforcement learning.人类强化学习中的价值归一化的函数形式。
Elife. 2023 Jul 10;12:e83891. doi: 10.7554/eLife.83891.
4
Episodic memory governs choices: An RNN-based reinforcement learning model for decision-making task.情景记忆支配选择:基于 RNN 的强化学习模型在决策任务中的应用。
Neural Netw. 2021 Feb;134:1-10. doi: 10.1016/j.neunet.2020.11.003. Epub 2020 Nov 18.
5
Generalization of value in reinforcement learning by humans.人类在强化学习中的价值泛化。
Eur J Neurosci. 2012 Apr;35(7):1092-104. doi: 10.1111/j.1460-9568.2012.08017.x.
6
Sex differences in learning from exploration.从探索中学习的性别差异。
Elife. 2021 Nov 19;10:e69748. doi: 10.7554/eLife.69748.
7
Reinforcement learning signals in the human striatum distinguish learners from nonlearners during reward-based decision making.在基于奖励的决策过程中,人类纹状体中的强化学习信号可区分学习者和非学习者。
J Neurosci. 2007 Nov 21;27(47):12860-7. doi: 10.1523/JNEUROSCI.2496-07.2007.
8
A causal role of estradiol in human reinforcement learning.雌激素在人类强化学习中的因果作用。
Horm Behav. 2021 Aug;134:105022. doi: 10.1016/j.yhbeh.2021.105022. Epub 2021 Jul 14.
9
Reward and avoidance learning in the context of aversive environments and possible implications for depressive symptoms.在厌恶环境背景下的奖励和回避学习及其对抑郁症状的可能影响。
Psychopharmacology (Berl). 2019 Aug;236(8):2437-2449. doi: 10.1007/s00213-019-05299-9. Epub 2019 Jun 28.
10
Choice history effects in mice and humans improve reward harvesting efficiency.在老鼠和人类中,选择历史效应可提高奖励收获效率。
PLoS Comput Biol. 2021 Oct 4;17(10):e1009452. doi: 10.1371/journal.pcbi.1009452. eCollection 2021 Oct.

引用本文的文献

1
Binary climate data visuals amplify perceived impact of climate change.二元气候数据可视化增强了人们对气候变化影响的感知。
Nat Hum Behav. 2025 Apr 17. doi: 10.1038/s41562-025-02183-9.
2
The challenges of lifelong learning in biological and artificial systems.生物和人工系统中终身学习的挑战。
Trends Cogn Sci. 2022 Dec;26(12):1051-1053. doi: 10.1016/j.tics.2022.09.022. Epub 2022 Nov 2.

本文引用的文献

1
Optimism and pessimism in optimised replay.优化重放中的乐观主义和悲观主义。
PLoS Comput Biol. 2022 Jan 12;18(1):e1009634. doi: 10.1371/journal.pcbi.1009634. eCollection 2022 Jan.
2
Scientists' warning against the society of waste.科学家警告反对浪费的社会。
Sci Total Environ. 2022 Mar 10;811:151359. doi: 10.1016/j.scitotenv.2021.151359. Epub 2021 Nov 4.
3
A model of mood as integrated advantage.心境的整合优势模型。
Psychol Rev. 2022 Apr;129(3):513-541. doi: 10.1037/rev0000294. Epub 2021 Sep 13.
4
Reinforcement Learning Disruptions in Individuals With Depression and Sensitivity to Symptom Change Following Cognitive Behavioral Therapy.抑郁症患者的强化学习中断与认知行为治疗后症状变化的敏感性。
JAMA Psychiatry. 2021 Oct 1;78(10):1113-1122. doi: 10.1001/jamapsychiatry.2021.1844.
5
Anxiety, avoidance, and sequential evaluation.焦虑、回避与顺序评估。
Comput Psychiatr. 2020;4. doi: 10.1162/cpsy_a_00026. Epub 2020 Mar 1.
6
Momentary subjective well-being depends on learning and not reward.瞬间主观幸福感取决于学习而非奖励。
Elife. 2020 Nov 17;9:e57977. doi: 10.7554/eLife.57977.
7
Scientists' warning on affluence.科学家对富足的警示。
Nat Commun. 2020 Jun 19;11(1):3107. doi: 10.1038/s41467-020-16941-y.
8
Why Open-Endedness Matters.开放性为何重要。
Artif Life. 2019 Summer;25(3):232-235. doi: 10.1162/artl_a_00294.
9
Generalization guides human exploration in vast decision spaces.泛化指导人类在广阔的决策空间中进行探索。
Nat Hum Behav. 2018 Dec;2(12):915-924. doi: 10.1038/s41562-018-0467-4. Epub 2018 Nov 12.
10
Clarifying the overlap between motivation and negative symptom measures in schizophrenia research: A meta-analysis.澄清精神分裂症研究中动机和阴性症状测量之间的重叠:一项荟萃分析。
Schizophr Res. 2019 Apr;206:27-36. doi: 10.1016/j.schres.2018.10.010. Epub 2018 Dec 19.