• 文献检索
  • 文档翻译
  • 深度研究
  • 学术资讯
  • Suppr Zotero 插件Zotero 插件
  • 邀请有礼
  • 套餐&价格
  • 历史记录
应用&插件
Suppr Zotero 插件Zotero 插件浏览器插件Mac 客户端Windows 客户端微信小程序
定价
高级版会员购买积分包购买API积分包
服务
文献检索文档翻译深度研究API 文档MCP 服务
关于我们
关于 Suppr公司介绍联系我们用户协议隐私条款
关注我们

Suppr 超能文献

核心技术专利:CN118964589B侵权必究
粤ICP备2023148730 号-1Suppr @ 2026

文献检索

告别复杂PubMed语法,用中文像聊天一样搜索,搜遍4000万医学文献。AI智能推荐,让科研检索更轻松。

立即免费搜索

文件翻译

保留排版,准确专业,支持PDF/Word/PPT等文件格式,支持 12+语言互译。

免费翻译文档

深度研究

AI帮你快速写综述,25分钟生成高质量综述,智能提取关键信息,辅助科研写作。

立即免费体验

决策网络中受奖励调节的学习的统计力学。

Statistical mechanics of reward-modulated learning in decision-making networks.

机构信息

Japan Science Technology Agency, ERATO, Okanoya Emotional Information Project, 351-0198 Saitama, Japan.

出版信息

Neural Comput. 2012 May;24(5):1230-70. doi: 10.1162/NECO_a_00264. Epub 2012 Feb 1.

DOI:10.1162/NECO_a_00264
PMID:22295982
Abstract

The neural substrates of decision making have been intensively studied using experimental and computational approaches. Alternative-choice tasks accompanying reinforcement have often been employed in investigations into decision making. Choice behavior has been empirically found in many experiments to follow Herrnstein's matching law. A number of theoretical studies have been done on explaining the mechanisms responsible for matching behavior. Various learning rules have been proved in these studies to achieve matching behavior as a steady state of learning processes. The models in the studies have consisted of a few parameters. However, a large number of neurons and synapses are expected to participate in decision making in the brain. We investigated learning behavior in simple but large-scale decision-making networks. We considered the covariance learning rule, which has been demonstrated to achieve matching behavior as a steady state (Loewenstein & Seung, 2006 ). We analyzed model behavior in a thermodynamic limit where the number of plastic synapses went to infinity. By means of techniques of the statistical mechanics, we can derive deterministic differential equations in this limit for the order parameters, which allow an exact calculation of the evolution of choice behavior. As a result, we found that matching behavior cannot be a steady state of learning when the fluctuations in input from individual sensory neurons are so large that they affect the net input to value-encoding neurons. This situation naturally arises when the synaptic strength is sufficiently strong and the excitatory input and the inhibitory input to the value-encoding neurons are balanced. The deviation from matching behavior is caused by increasing variance in the input potential due to the diffusion of synaptic efficacies. This effect causes an undermatching phenomenon, which has been often observed in behavioral experiments.

摘要

决策的神经基础已经通过实验和计算方法得到了深入研究。在决策研究中,经常使用伴随强化的替代选择任务。在许多实验中,选择行为被经验发现遵循 Herrnstein 的匹配定律。许多理论研究致力于解释导致匹配行为的机制。在这些研究中,各种学习规则被证明可以作为学习过程的稳态来实现匹配行为。研究中的模型由几个参数组成。然而,在大脑中,预期有大量的神经元和突触参与决策。我们研究了简单但大规模决策网络中的学习行为。我们考虑了协方差学习规则,该规则已被证明可以作为稳态实现匹配行为(Loewenstein 和 Seung,2006)。我们在塑性突触数量趋于无穷大的热力学极限下分析了模型行为。通过统计力学的技术,我们可以在这个极限下为序参数推导出确定的微分方程,这允许对选择行为的演化进行精确计算。结果表明,当来自单个感觉神经元的输入波动大到足以影响价值编码神经元的净输入时,匹配行为不可能是学习的稳态。当突触强度足够强且价值编码神经元的兴奋性输入和抑制性输入平衡时,就会出现这种情况。偏离匹配行为是由于突触效率的扩散导致输入电位的方差增加引起的。这种效应导致了一种过度匹配现象,这种现象在行为实验中经常观察到。

相似文献

1
Statistical mechanics of reward-modulated learning in decision-making networks.决策网络中受奖励调节的学习的统计力学。
Neural Comput. 2012 May;24(5):1230-70. doi: 10.1162/NECO_a_00264. Epub 2012 Feb 1.
2
Robustness of learning that is based on covariance-driven synaptic plasticity.基于协方差驱动突触可塑性的学习的稳健性。
PLoS Comput Biol. 2008 Mar 7;4(3):e1000007. doi: 10.1371/journal.pcbi.1000007.
3
Reward-dependent learning in neuronal networks for planning and decision making.用于规划和决策的神经网络中基于奖励的学习。
Prog Brain Res. 2000;126:217-29. doi: 10.1016/S0079-6123(00)26016-0.
4
Operant matching is a generic outcome of synaptic plasticity based on the covariance between reward and neural activity.操作性匹配是基于奖励与神经活动之间的协方差的突触可塑性的一般结果。
Proc Natl Acad Sci U S A. 2006 Oct 10;103(41):15224-9. doi: 10.1073/pnas.0505220103. Epub 2006 Sep 28.
5
A biophysically based neural model of matching law behavior: melioration by stochastic synapses.基于生物物理学的匹配律行为神经模型:随机突触导致的改善。
J Neurosci. 2006 Apr 5;26(14):3731-44. doi: 10.1523/JNEUROSCI.5159-05.2006.
6
Neuron as a reward-modulated combinatorial switch and a model of learning behavior.神经元作为一种受奖励调节的组合开关和学习行为的模型。
Neural Netw. 2013 Oct;46:62-74. doi: 10.1016/j.neunet.2013.04.010. Epub 2013 May 6.
7
The actor-critic learning is behind the matching law: matching versus optimal behaviors.行动者-评论家学习是匹配法则背后的原理:匹配行为与最优行为。
Neural Comput. 2008 Jan;20(1):227-51. doi: 10.1162/neco.2008.20.1.227.
8
[Neural mechanisms of decision making].[决策的神经机制]
Brain Nerve. 2008 Sep;60(9):1017-27.
9
Reward-modulated Hebbian learning of decision making.奖励调节的决策赫布学习。
Neural Comput. 2010 Jun;22(6):1399-444. doi: 10.1162/neco.2010.03-09-980.
10
Reinforcement learning through modulation of spike-timing-dependent synaptic plasticity.通过调节尖峰时间依赖性突触可塑性进行强化学习。
Neural Comput. 2007 Jun;19(6):1468-502. doi: 10.1162/neco.2007.19.6.1468.

引用本文的文献

1
Bayesian deterministic decision making: a normative account of the operant matching law and heavy-tailed reward history dependency of choices.贝叶斯确定性决策:对操作性匹配律和选择中重尾奖励历史依赖性的规范解释。
Front Comput Neurosci. 2014 Mar 4;8:18. doi: 10.3389/fncom.2014.00018. eCollection 2014.
2
Dynamical regimes in neural network models of matching behavior.匹配行为的神经网络模型中的动力学状态。
Neural Comput. 2013 Dec;25(12):3093-112. doi: 10.1162/NECO_a_00522. Epub 2013 Sep 18.