Johnson Adam, Redish A David
Center for Cognitive Sciences and Graduate Program in Neuroscience, University of Minnesota, MN 55455, USA.
Neural Netw. 2005 Nov;18(9):1163-71. doi: 10.1016/j.neunet.2005.08.009. Epub 2005 Sep 29.
Temporal difference reinforcement learning (TDRL) algorithms, hypothesized to partially explain basal ganglia functionality, learn more slowly than real animals. Modified TDRL algorithms (e.g. the Dyna-Q family) learn faster than standard TDRL by practicing experienced sequences offline. We suggest that the replay phenomenon, in which ensembles of hippocampal neurons replay previously experienced firing sequences during subsequent rest and sleep, may provide practice sequences to improve the speed of TDRL learning, even within a single session. We test the plausibility of this hypothesis in a computational model of a multiple-T choice-task. Rats show two learning rates on this task: a fast decrease in errors and a slow development of a stereotyped path. Adding developing replay to the model accelerates learning the correct path, but slows down the stereotyping of that path. These models provide testable predictions relating the effects of hippocampal inactivation as well as hippocampal replay on this task.
时间差分强化学习(TDRL)算法被假定为可部分解释基底神经节的功能,但它的学习速度比真实动物慢。改进的TDRL算法(如Dyna-Q家族)通过离线练习已有的序列,学习速度比标准TDRL更快。我们认为,海马体神经元集合在随后的休息和睡眠期间重放先前经历的放电序列的重放现象,可能提供练习序列以提高TDRL学习的速度,即使在单个会话中也是如此。我们在一个多选项任务的计算模型中测试了这一假设的合理性。大鼠在这个任务上表现出两种学习速度:错误快速减少,以及刻板路径的缓慢形成。在模型中加入逐渐发展的重放过程会加速对正确路径的学习,但会减缓该路径的刻板化。这些模型提供了可测试的预测,涉及海马体失活以及海马体重放在此任务上的影响。