决策网络中受奖励调节的学习的统计力学。

Statistical mechanics of reward-modulated learning in decision-making networks.

机构信息

Japan Science Technology Agency, ERATO, Okanoya Emotional Information Project, 351-0198 Saitama, Japan.

出版信息

Neural Comput. 2012 May;24(5):1230-70. doi: 10.1162/NECO_a_00264. Epub 2012 Feb 1.

Abstract

The neural substrates of decision making have been intensively studied using experimental and computational approaches. Alternative-choice tasks accompanying reinforcement have often been employed in investigations into decision making. Choice behavior has been empirically found in many experiments to follow Herrnstein's matching law. A number of theoretical studies have been done on explaining the mechanisms responsible for matching behavior. Various learning rules have been proved in these studies to achieve matching behavior as a steady state of learning processes. The models in the studies have consisted of a few parameters. However, a large number of neurons and synapses are expected to participate in decision making in the brain. We investigated learning behavior in simple but large-scale decision-making networks. We considered the covariance learning rule, which has been demonstrated to achieve matching behavior as a steady state (Loewenstein & Seung, 2006 ). We analyzed model behavior in a thermodynamic limit where the number of plastic synapses went to infinity. By means of techniques of the statistical mechanics, we can derive deterministic differential equations in this limit for the order parameters, which allow an exact calculation of the evolution of choice behavior. As a result, we found that matching behavior cannot be a steady state of learning when the fluctuations in input from individual sensory neurons are so large that they affect the net input to value-encoding neurons. This situation naturally arises when the synaptic strength is sufficiently strong and the excitatory input and the inhibitory input to the value-encoding neurons are balanced. The deviation from matching behavior is caused by increasing variance in the input potential due to the diffusion of synaptic efficacies. This effect causes an undermatching phenomenon, which has been often observed in behavioral experiments.

摘要

决策的神经基础已经通过实验和计算方法得到了深入研究。在决策研究中，经常使用伴随强化的替代选择任务。在许多实验中，选择行为被经验发现遵循 Herrnstein 的匹配定律。许多理论研究致力于解释导致匹配行为的机制。在这些研究中，各种学习规则被证明可以作为学习过程的稳态来实现匹配行为。研究中的模型由几个参数组成。然而，在大脑中，预期有大量的神经元和突触参与决策。我们研究了简单但大规模决策网络中的学习行为。我们考虑了协方差学习规则，该规则已被证明可以作为稳态实现匹配行为（Loewenstein 和 Seung，2006）。我们在塑性突触数量趋于无穷大的热力学极限下分析了模型行为。通过统计力学的技术，我们可以在这个极限下为序参数推导出确定的微分方程，这允许对选择行为的演化进行精确计算。结果表明，当来自单个感觉神经元的输入波动大到足以影响价值编码神经元的净输入时，匹配行为不可能是学习的稳态。当突触强度足够强且价值编码神经元的兴奋性输入和抑制性输入平衡时，就会出现这种情况。偏离匹配行为是由于突触效率的扩散导致输入电位的方差增加引起的。这种效应导致了一种过度匹配现象，这种现象在行为实验中经常观察到。

Suppr 超能文献

文献检索

文件翻译

深度研究

Suppr 超能文献

文献检索

文件翻译

深度研究

决策网络中受奖励调节的学习的统计力学。

Statistical mechanics of reward-modulated learning in decision-making networks.

机构信息

出版信息

相似文献

引用本文的文献

决策网络中受奖励调节的学习的统计力学。

Statistical mechanics of reward-modulated learning in decision-making networks.

机构信息

出版信息

相似文献

引用本文的文献