Prével Arthur, Krebs Ruth M
Department of Experimental Psychology, Ghent University, Ghent, Belgium.
Front Behav Neurosci. 2021 Nov 11;15:749517. doi: 10.3389/fnbeh.2021.749517. eCollection 2021.
In a new environment, humans and animals can detect and learn that cues predict meaningful outcomes, and use this information to adapt their responses. This process is termed Pavlovian conditioning. Pavlovian conditioning is also observed for stimuli that predict outcome-associated cues; a second type of conditioning is termed higher-order Pavlovian conditioning. In this review, we will focus on higher-order conditioning studies with simultaneous and backward conditioned stimuli. We will examine how the results from these experiments pose a challenge to models of Pavlovian conditioning like the Temporal Difference (TD) models, in which learning is mainly driven by reward prediction errors. Contrasting with this view, the results suggest that humans and animals can form complex representations of the (temporal) structure of the task, and use this information to guide behavior, which seems consistent with model-based reinforcement learning. Future investigations involving these procedures could result in important new insights on the mechanisms that underlie Pavlovian conditioning.
在新环境中,人类和动物能够察觉并学会某些线索预示着有意义的结果,并利用这些信息来调整自身反应。这一过程被称为巴甫洛夫条件反射。对于预示与结果相关线索的刺激,也会观察到巴甫洛夫条件反射;第二种条件反射类型被称为高阶巴甫洛夫条件反射。在本综述中,我们将聚焦于同时呈现条件刺激和逆向条件刺激的高阶条件反射研究。我们将探讨这些实验结果如何对诸如时间差分(TD)模型这类巴甫洛夫条件反射模型构成挑战,在TD模型中,学习主要由奖励预测误差驱动。与这一观点形成对比的是,结果表明人类和动物能够形成任务(时间)结构的复杂表征,并利用这些信息来指导行为,这似乎与基于模型的强化学习相一致。涉及这些程序的未来研究可能会对巴甫洛夫条件反射背后的机制产生重要的新见解。