Viejo Guillaume, Girard Benoît, Procyk Emmanuel, Khamassi Mehdi
Sorbonne Universités, UPMC Univ Paris 06, CNRS, Institute of Intelligent Systems and Robotics (ISIR), F-75005 Paris, France; Montreal Neurological Institute and Hospital, 3801 University Street, Montreal, Quebec, Canada.
Sorbonne Universités, UPMC Univ Paris 06, CNRS, Institute of Intelligent Systems and Robotics (ISIR), F-75005 Paris, France.
Behav Brain Res. 2018 Dec 14;355:76-89. doi: 10.1016/j.bbr.2017.09.030. Epub 2017 Oct 20.
Accumulating evidence suggest that human behavior in trial-and-error learning tasks based on decisions between discrete actions may involve a combination of reinforcement learning (RL) and working-memory (WM). While the understanding of brain activity at stake in this type of tasks often involve the comparison with non-human primate neurophysiological results, it is not clear whether monkeys use similar combined RL and WM processes to solve these tasks. Here we analyzed the behavior of five monkeys with computational models combining RL and WM. Our model-based analysis approach enables to not only fit trial-by-trial choices but also transient slowdowns in reaction times, indicative of WM use. We found that the behavior of the five monkeys was better explained in terms of a combination of RL and WM despite inter-individual differences. The same coordination dynamics we used in a previous study in humans best explained the behavior of some monkeys while the behavior of others showed the opposite pattern, revealing a possible different dynamics of WM process. We further analyzed different variants of the tested models to open a discussion on how the long pretraining in these tasks may have favored particular coordination dynamics between RL and WM. This points towards either inter-species differences or protocol differences which could be further tested in humans.
越来越多的证据表明,人类在基于离散动作之间的决策进行试错学习任务时的行为可能涉及强化学习(RL)和工作记忆(WM)的结合。虽然对这类任务中大脑活动的理解通常涉及与非人类灵长类动物神经生理学结果的比较,但尚不清楚猴子是否使用类似的RL和WM组合过程来解决这些任务。在这里,我们用结合RL和WM的计算模型分析了五只猴子的行为。我们基于模型的分析方法不仅能够拟合逐次试验的选择,还能拟合反应时间的短暂减慢,这表明使用了WM。我们发现,尽管存在个体差异,但五只猴子的行为用RL和WM的组合来解释更好。我们在之前一项针对人类的研究中使用的相同协调动态最能解释一些猴子的行为,而其他猴子的行为则呈现相反模式,揭示了WM过程可能存在不同的动态。我们进一步分析了测试模型的不同变体,以开启关于这些任务中的长期预训练如何可能有利于RL和WM之间特定协调动态的讨论。这指向了种间差异或实验方案差异,可在人类中进一步测试。