Department of Psychology, University of South Carolina, USA.
Department of Psychology, University of South Carolina, USA.
Cognition. 2023 Jan;230:105280. doi: 10.1016/j.cognition.2022.105280. Epub 2022 Sep 12.
Previous studies of reinforcement learning (RL) have established that choice outcomes are encoded in a context-dependent fashion. Several computational models have been proposed to explain context-dependent encoding, including reference point centering and range adaptation models. The former assumes that outcomes are centered around a running estimate of the average reward in each choice context, while the latter assumes that outcomes are compared to the minimum reward and then scaled by an estimate of the range of outcomes in each choice context. However, there are other computational mechanisms that can explain context dependence in RL. In the present study, a frequency encoding model is introduced that assumes outcomes are evaluated based on their proportional rank within a sample of recently experienced outcomes from the local context. A range-frequency model is also considered that combines the range adaptation and frequency encoding mechanisms. We conducted two fully incentivized behavioral experiments using choice tasks for which the candidate models make divergent predictions. The results were most consistent with models that incorporate frequency or rank-based encoding. The findings from these experiments deepen our understanding of the underlying computational processes mediating context-dependent outcome encoding in human RL.
先前关于强化学习(RL)的研究已经证实,选择结果是以依赖上下文的方式进行编码的。已经提出了几种计算模型来解释上下文相关的编码,包括参考点中心化和范围适应模型。前者假设结果以每个选择上下文的平均奖励的运行估计为中心,而后者假设结果与最低奖励进行比较,然后根据每个选择上下文的结果范围的估计进行缩放。然而,还有其他计算机制可以解释 RL 中的上下文依赖性。在本研究中,引入了一种频率编码模型,该模型假设结果是基于其在最近从局部上下文经历的样本中进行的比例排名进行评估的。还考虑了一种范围频率模型,该模型结合了范围适应和频率编码机制。我们进行了两项完全激励的行为实验,使用候选模型做出不同预测的选择任务。结果与纳入频率或基于等级的编码的模型最一致。这些实验的结果加深了我们对人类 RL 中介导上下文相关结果编码的基本计算过程的理解。