Don Hilary J, Worthy Darrell A
Department of Psychological and Brain Sciences.
J Exp Psychol Learn Mem Cogn. 2022 Sep;48(9):1311-1327. doi: 10.1037/xlm0000896. Epub 2021 Apr 19.
Recent work in reinforcement learning has demonstrated a choice preference for an option that has a lower probability of reward (A) when paired with an alternative option that has a higher probability of reward (C), if A has been experienced more frequently than C (the frequency effect). This finding is critical as it is inconsistent with widespread assumptions that expected value is based on average reward, and instead suggests that value is based on cumulative instances of reward. However, option frequency may also affect instrumental reinforcement of choosing A during training, which may then transfer to choice on AC trials. This study therefore aimed to assess the contribution of action reinforcement and option value to the frequency-effect across 2 experiments. In both experiments we included an additional test phase in which participants were asked to rate the likelihood of reward for each choice option, a response that should be unaffected by action reinforcement. In Experiment 1, participants completed the original choice training phase. In Experiment 2, participants were presented with each option individually, thus removing reinforcement of choice during training. Single cue training reduced the strength of the preference for A compared to choice training, suggesting a contributing role of action reinforcement. However, frequency effects were still evident in both experiments. We found that the pattern of reward likelihood ratings was consistent with the pattern of choice preferences in both experiments, suggesting that action reinforcement may also influence judgements about the likelihood of receiving reward. (PsycInfo Database Record (c) 2022 APA, all rights reserved).
强化学习领域的近期研究表明,如果选项A比选项C被体验的频率更高(频率效应),那么当选项A与奖励概率更高的替代选项C配对时,人们会对奖励概率较低的选项A表现出选择偏好。这一发现至关重要,因为它与广泛的假设不一致,即预期价值基于平均奖励,相反,这表明价值基于奖励的累积实例。然而,选项频率也可能影响训练期间选择A的工具性强化,这可能会转移到AC试验中的选择上。因此,本研究旨在通过两个实验评估行动强化和选项价值对频率效应的贡献。在两个实验中,我们都增加了一个测试阶段,要求参与者对每个选择选项的奖励可能性进行评分,这种反应应该不受行动强化的影响。在实验1中,参与者完成了原始的选择训练阶段。在实验2中,每个选项单独呈现给参与者,从而消除了训练期间的选择强化。与选择训练相比,单线索训练降低了对A的偏好强度,表明行动强化起到了一定作用。然而,频率效应在两个实验中仍然很明显。我们发现,奖励可能性评分模式在两个实验中都与选择偏好模式一致,这表明行动强化也可能影响对获得奖励可能性的判断。(PsycInfo数据库记录(c)2022美国心理学会,保留所有权利)