Soltani Alireza, Wang Xiao-Jing
Volen Center for Complex Systems, Department of Physics, Brandeis University, Waltham, Massachusetts 02454, USA.
J Neurosci. 2006 Apr 5;26(14):3731-44. doi: 10.1523/JNEUROSCI.5159-05.2006.
In experiments designed to uncover the neural basis of adaptive decision making in a foraging environment, neuroscientists have reported single-cell activities in the lateral intraparietal cortex (LIP) that are correlated with choice options and their subjective values. To investigate the underlying synaptic mechanism, we considered a spiking neuron model of decision making endowed with synaptic plasticity that follows a reward-dependent stochastic Hebbian learning rule. This general model is tested in a matching task in which rewards on two targets are scheduled randomly with different rates. Our main results are threefold. First, we show that plastic synapses provide a natural way to integrate past rewards and estimate the local (in time) "return" of a choice. Second, our model reproduces the matching behavior (i.e., the proportional allocation of choices matches the relative reinforcement obtained on those choices, which is achieved through melioration in individual trials). Our model also explains the observed "undermatching" phenomenon and points to biophysical constraints (such as finite learning rate and stochastic neuronal firing) that set the limits to matching behavior. Third, although our decision model is an attractor network exhibiting winner-take-all competition, it captures graded neural spiking activities observed in LIP, when the latter were sorted according to the choices and the difference in the returns for the two targets. These results suggest that neurons in LIP are involved in selecting the oculomotor responses, whereas rewards are integrated and stored elsewhere, possibly by plastic synapses and in the form of the return rather than income of choice options.
在旨在揭示觅食环境中适应性决策的神经基础的实验中,神经科学家报告了顶内沟外侧皮质(LIP)中的单细胞活动,这些活动与选择选项及其主观价值相关。为了研究潜在的突触机制,我们考虑了一个具有突触可塑性的决策发放神经元模型,该模型遵循依赖奖励的随机赫布学习规则。这个通用模型在一个匹配任务中进行了测试,其中两个目标上的奖励以不同的速率随机安排。我们的主要结果有三个方面。第一,我们表明可塑性突触提供了一种自然的方式来整合过去的奖励并估计选择的局部(即时)“回报”。第二,我们的模型再现了匹配行为(即选择的比例分配与在这些选择上获得的相对强化相匹配,这是通过单个试验中的改进实现的)。我们的模型还解释了观察到的“欠匹配”现象,并指出了生物物理限制(如有限的学习率和随机的神经元放电)对匹配行为设定了限制。第三,尽管我们的决策模型是一个表现出胜者全得竞争的吸引子网络,但当根据选择以及两个目标回报的差异对顶内沟外侧皮质中观察到的分级神经发放活动进行分类时,它捕捉到了这些活动。这些结果表明,顶内沟外侧皮质中的神经元参与选择动眼反应,而奖励可能通过可塑性突触以回报而非选择选项的收益的形式在其他地方进行整合和存储。