Ludvig Elliot A, Sutton Richard S, Kehoe E James
Princeton Neuroscience Institute and Department of Mechanical & Aerospace Engineering, Princeton University, 3-N-12 Green Hall, Princeton, NJ 08542, USA.
Learn Behav. 2012 Sep;40(3):305-19. doi: 10.3758/s13420-012-0082-6.
The temporal-difference (TD) algorithm from reinforcement learning provides a simple method for incrementally learning predictions of upcoming events. Applied to classical conditioning, TD models suppose that animals learn a real-time prediction of the unconditioned stimulus (US) on the basis of all available conditioned stimuli (CSs). In the TD model, similar to other error-correction models, learning is driven by prediction errors--the difference between the change in US prediction and the actual US. With the TD model, however, learning occurs continuously from moment to moment and is not artificially constrained to occur in trials. Accordingly, a key feature of any TD model is the assumption about the representation of a CS on a moment-to-moment basis. Here, we evaluate the performance of the TD model with a heretofore unexplored range of classical conditioning tasks. To do so, we consider three stimulus representations that vary in their degree of temporal generalization and evaluate how the representation influences the performance of the TD model on these conditioning tasks.
强化学习中的时间差分(TD)算法提供了一种简单的方法来逐步学习对即将发生事件的预测。应用于经典条件作用时,TD模型假设动物基于所有可用的条件刺激(CS)对无条件刺激(US)进行实时预测。在TD模型中,与其他误差校正模型类似,学习由预测误差驱动——即US预测的变化与实际US之间的差异。然而,对于TD模型,学习是时刻连续发生的,并非人为地局限于在试验中发生。因此,任何TD模型的一个关键特征是关于CS在时刻基础上的表征假设。在此,我们用一系列此前未探索过的经典条件作用任务来评估TD模型的性能。为此,我们考虑三种在时间泛化程度上有所不同的刺激表征,并评估这种表征如何影响TD模型在这些条件作用任务中的性能。