Department of Neurobiology, Alexander Silberman Institute of Life Sciences, Interdisciplinary Center for Neural Computation, Edmond and Lily Safra Center for Brain Sciences, Hebrew University, Jerusalem 91904, Israel.
J Neurosci. 2013 Jan 23;33(4):1521-34. doi: 10.1523/JNEUROSCI.2068-12.2013.
In free operant experiments, subjects alternate at will between targets that yield rewards stochastically. Behavior in these experiments is typically characterized by (1) an exponential distribution of stay durations, (2) matching of the relative time spent at a target to its relative share of the total number of rewards, and (3) adaptation after a change in the reward rates that can be very fast. The neural mechanism underlying these regularities is largely unknown. Moreover, current decision-making neural network models typically aim at explaining behavior in discrete-time experiments in which a single decision is made once in every trial, making these models hard to extend to the more natural case of free operant decisions. Here we show that a model based on attractor dynamics, in which transitions are induced by noise and preference is formed via covariance-based synaptic plasticity, can account for the characteristics of behavior in free operant experiments. We compare a specific instance of such a model, in which two recurrently excited populations of neurons compete for higher activity, to the behavior of rats responding on two levers for rewarding brain stimulation on a concurrent variable interval reward schedule (Gallistel et al., 2001). We show that the model is consistent with the rats' behavior, and in particular, with the observed fast adaptation to matching behavior. Further, we show that the neural model can be reduced to a behavioral model, and we use this model to deduce a novel "conservation law," which is consistent with the behavior of the rats.
在自由操作实验中,被试可以随意在随机产生奖励的目标之间进行交替。这些实验中的行为通常具有以下特征:(1)停留时间呈指数分布,(2)目标上花费的相对时间与其在总奖励数中的相对份额相匹配,以及(3)在奖励率变化后的快速适应。这些规律的神经机制在很大程度上是未知的。此外,当前的决策神经网络模型通常旨在解释离散时间实验中的行为,在离散时间实验中,每一次试验只做出一次决策,这使得这些模型难以扩展到更自然的自由操作决策情况。在这里,我们表明,基于吸引子动力学的模型,其中通过噪声诱导转换,并且通过基于协方差的突触可塑性形成偏好,可以解释自由操作实验中的行为特征。我们将这种模型的一个特定实例与大鼠的行为进行了比较,大鼠在同时进行的可变间隔奖励计划中,通过两个杠杆对奖励性脑刺激做出反应(Gallistel 等人,2001)。我们表明,该模型与大鼠的行为一致,特别是与观察到的快速适应匹配行为一致。此外,我们表明,神经模型可以简化为行为模型,并且我们使用该模型推导出一个新的“守恒定律”,该定律与大鼠的行为一致。