Department of Psychology, Columbia University, New York, NY 10027, USA.
Zuckerman Mind Brain Behavior Institute, Columbia University, New York, NY 10027, USA.
Sci Adv. 2019 Jul 31;5(7):eaaw2089. doi: 10.1126/sciadv.aaw2089. eCollection 2019 Jul.
Most accounts of behavior in nonhuman animals assume that they make choices to maximize expected reward value. However, model-free reinforcement learning based on reward associations cannot account for choice behavior in transitive inference paradigms. We manipulated the amount of reward associated with each item of an ordered list, so that maximizing expected reward value was always in conflict with decision rules based on the implicit list order. Under such a schedule, model-free reinforcement algorithms cannot achieve high levels of accuracy, even after extensive training. Monkeys nevertheless learned to make correct rule-based choices. These results show that monkeys' performance in transitive inference paradigms is not driven solely by expected reward and that appropriate inferences are made despite discordant reward incentives. We show that their choices can be explained by an abstract, model-based representation of list order, and we provide a method for inferring the contents of such representations from observed data.
大多数关于非人类动物行为的解释都假设它们会做出选择以最大化预期奖励值。然而,基于奖励关联的无模型强化学习无法解释在传递推理范式中的选择行为。我们操纵了与有序列表中每个项目相关联的奖励数量,因此最大化预期奖励值总是与基于隐式列表顺序的决策规则相冲突。在这种情况下,即使经过广泛的训练,无模型强化算法也无法达到很高的准确性。然而,猴子学会了做出正确的基于规则的选择。这些结果表明,猴子在传递推理范式中的表现不仅仅取决于预期奖励,并且尽管奖励激励存在不一致,但仍能做出适当的推断。我们表明,他们的选择可以通过对列表顺序的抽象、基于模型的表示来解释,并且我们提供了一种从观察数据中推断这种表示内容的方法。