Soltani Alireza, Lee Daeyeol, Wang Xiao-Jing
Department of Physics and Volen Center for Complex Systems, Brandeis University, Waltham, MA 02454, USA.
Neural Netw. 2006 Oct;19(8):1075-90. doi: 10.1016/j.neunet.2006.05.044.
Previous studies have shown that non-human primates can generate highly stochastic choice behaviour, especially when this is required during a competitive interaction with another agent. To understand the neural mechanism of such dynamic choice behaviour, we propose a biologically plausible model of decision making endowed with synaptic plasticity that follows a reward-dependent stochastic Hebbian learning rule. This model constitutes a biophysical implementation of reinforcement learning, and it reproduces salient features of behavioural data from an experiment with monkeys playing a matching pennies game. Due to interaction with an opponent and learning dynamics, the model generates quasi-random behaviour robustly in spite of intrinsic biases. Furthermore, non-random choice behaviour can also emerge when the model plays against a non-interactive opponent, as observed in the monkey experiment. Finally, when combined with a meta-learning algorithm, our model accounts for the slow drift in the animal's strategy based on a process of reward maximization.
先前的研究表明,非人类灵长类动物能够产生高度随机的选择行为,尤其是在与另一个主体进行竞争性互动时需要这种行为的情况下。为了理解这种动态选择行为的神经机制,我们提出了一种具有生物学合理性的决策模型,该模型具有遵循奖励依赖型随机赫布学习规则的突触可塑性。这个模型构成了强化学习的生物物理实现,并且它再现了猴子玩匹配便士游戏实验中行为数据的显著特征。由于与对手的互动和学习动态,该模型尽管存在内在偏差,仍能稳健地产生准随机行为。此外,正如在猴子实验中观察到的那样,当该模型与非交互式对手对抗时,也会出现非随机选择行为。最后,当与元学习算法相结合时,我们的模型基于奖励最大化过程解释了动物策略中的缓慢漂移。