Department of Computer Science, Princeton University, Princeton, New Jersey, United States of America.
Department of Psychology, Princeton University, Princeton, New Jersey, United States of America.
PLoS Comput Biol. 2022 Aug 4;18(8):e1010316. doi: 10.1371/journal.pcbi.1010316. eCollection 2022 Aug.
In evaluating our choices, we often suffer from two tragic relativities. First, when our lives change for the better, we rapidly habituate to the higher standard of living. Second, we cannot escape comparing ourselves to various relative standards. Habituation and comparisons can be very disruptive to decision-making and happiness, and till date, it remains a puzzle why they have come to be a part of cognition in the first place. Here, we present computational evidence that suggests that these features might play an important role in promoting adaptive behavior. Using the framework of reinforcement learning, we explore the benefit of employing a reward function that, in addition to the reward provided by the underlying task, also depends on prior expectations and relative comparisons. We find that while agents equipped with this reward function are less happy, they learn faster and significantly outperform standard reward-based agents in a wide range of environments. Specifically, we find that relative comparisons speed up learning by providing an exploration incentive to the agents, and prior expectations serve as a useful aid to comparisons, especially in sparsely-rewarded and non-stationary environments. Our simulations also reveal potential drawbacks of this reward function and show that agents perform sub-optimally when comparisons are left unchecked and when there are too many similar options. Together, our results help explain why we are prone to becoming trapped in a cycle of never-ending wants and desires, and may shed light on psychopathologies such as depression, materialism, and overconsumption.
在评估我们的选择时,我们常常受到两种悲剧性的相对性的困扰。首先,当我们的生活变得更好时,我们会迅速习惯更高的生活水平。其次,我们无法避免将自己与各种相对标准进行比较。习惯和比较会对决策和幸福感产生很大的干扰,迄今为止,它们为什么首先成为认知的一部分仍然是一个谜。在这里,我们提出了计算证据,表明这些特征可能在促进适应性行为方面发挥重要作用。我们使用强化学习框架,探索了使用奖励函数的好处,该函数除了基础任务提供的奖励外,还取决于先前的期望和相对比较。我们发现,虽然配备这种奖励函数的代理不太快乐,但它们在广泛的环境中学习速度更快,并且明显优于基于标准奖励的代理。具体来说,我们发现相对比较通过为代理提供探索激励来加速学习,并且先前的期望可以作为比较的有用辅助,特别是在奖励稀疏和非平稳环境中。我们的模拟还揭示了这种奖励函数的潜在缺点,并表明当比较不受控制且存在太多相似选项时,代理的表现会不佳。总的来说,我们的研究结果有助于解释为什么我们容易陷入永无止境的欲望循环,并可能为抑郁症、唯物主义和过度消费等心理病理学提供一些启示。