• 文献检索
  • 文档翻译
  • 深度研究
  • 学术资讯
  • Suppr Zotero 插件Zotero 插件
  • 邀请有礼
  • 套餐&价格
  • 历史记录
应用&插件
Suppr Zotero 插件Zotero 插件浏览器插件Mac 客户端Windows 客户端微信小程序
定价
高级版会员购买积分包购买API积分包
服务
文献检索文档翻译深度研究API 文档MCP 服务
关于我们
关于 Suppr公司介绍联系我们用户协议隐私条款
关注我们

Suppr 超能文献

核心技术专利:CN118964589B侵权必究
粤ICP备2023148730 号-1Suppr @ 2026

文献检索

告别复杂PubMed语法,用中文像聊天一样搜索,搜遍4000万医学文献。AI智能推荐,让科研检索更轻松。

立即免费搜索

文件翻译

保留排版,准确专业,支持PDF/Word/PPT等文件格式,支持 12+语言互译。

免费翻译文档

深度研究

AI帮你快速写综述,25分钟生成高质量综述,智能提取关键信息,辅助科研写作。

立即免费体验

奖励调节的决策赫布学习。

Reward-modulated Hebbian learning of decision making.

机构信息

Institute for Theoretical Computer Science, Graz University of Technology, A-8010 Graz, Austria.

出版信息

Neural Comput. 2010 Jun;22(6):1399-444. doi: 10.1162/neco.2010.03-09-980.

DOI:10.1162/neco.2010.03-09-980
PMID:20141476
Abstract

We introduce a framework for decision making in which the learning of decision making is reduced to its simplest and biologically most plausible form: Hebbian learning on a linear neuron. We cast our Bayesian-Hebb learning rule as reinforcement learning in which certain decisions are rewarded and prove that each synaptic weight will on average converge exponentially fast to the log-odd of receiving a reward when its pre- and postsynaptic neurons are active. In our simple architecture, a particular action is selected from the set of candidate actions by a winner-take-all operation. The global reward assigned to this action then modulates the update of each synapse. Apart from this global reward signal, our reward-modulated Bayesian Hebb rule is a pure Hebb update that depends only on the coactivation of the pre- and postsynaptic neurons, not on the weighted sum of all presynaptic inputs to the postsynaptic neuron as in the perceptron learning rule or the Rescorla-Wagner rule. This simple approach to action-selection learning requires that information about sensory inputs be presented to the Bayesian decision stage in a suitably preprocessed form resulting from other adaptive processes (acting on a larger timescale) that detect salient dependencies among input features. Hence our proposed framework for fast learning of decisions also provides interesting new hypotheses regarding neural nodes and computational goals of cortical areas that provide input to the final decision stage.

摘要

我们引入了一种决策框架,其中决策的学习被简化为最简单和最具生物学合理性的形式:线性神经元上的赫布学习。我们将贝叶斯-赫布学习规则表述为强化学习,其中某些决策会得到奖励,并证明当其前突触和后突触神经元活跃时,每个突触权重的平均值将以指数速度快速收敛到接收奖励的对数奇数。在我们的简单架构中,通过胜者全拿操作从候选动作集中选择特定的动作。然后,分配给此动作的全局奖励会调节每个突触的更新。除了这个全局奖励信号之外,我们的奖励调节贝叶斯赫布规则是一个纯粹的赫布更新,仅取决于前突触和后突触神经元的共同激活,而不像感知机学习规则或雷斯卡瓦规则那样取决于后突触神经元所有前突触输入的加权和。这种简单的动作选择学习方法要求将有关感官输入的信息以适当预处理的形式呈现给贝叶斯决策阶段,这种预处理是由其他自适应过程(在更大的时间尺度上运行)完成的,这些过程检测输入特征之间的显著依赖性。因此,我们提出的快速决策学习框架也为提供输入到最终决策阶段的皮质区域的神经节点和计算目标提供了有趣的新假设。

相似文献

1
Reward-modulated Hebbian learning of decision making.奖励调节的决策赫布学习。
Neural Comput. 2010 Jun;22(6):1399-444. doi: 10.1162/neco.2010.03-09-980.
2
Learning spike-based population codes by reward and population feedback.通过奖励和种群反馈来学习基于尖峰的种群代码。
Neural Comput. 2010 Jul;22(7):1698-717. doi: 10.1162/neco.2010.05-09-1010.
3
Supervised learning in spiking neural networks with ReSuMe: sequence learning, classification, and spike shifting.基于 ReSuMe 的尖峰神经网络监督学习:序列学习、分类和尖峰转移。
Neural Comput. 2010 Feb;22(2):467-510. doi: 10.1162/neco.2009.11-08-901.
4
Reward-dependent learning in neuronal networks for planning and decision making.用于规划和决策的神经网络中基于奖励的学习。
Prog Brain Res. 2000;126:217-29. doi: 10.1016/S0079-6123(00)26016-0.
5
Statistical mechanics of reward-modulated learning in decision-making networks.决策网络中受奖励调节的学习的统计力学。
Neural Comput. 2012 May;24(5):1230-70. doi: 10.1162/NECO_a_00264. Epub 2012 Feb 1.
6
Hebbian learning in linear-nonlinear networks with tuning curves leads to near-optimal, multi-alternative decision making.带有调谐曲线的线性-非线性网络中的赫布学习导致接近最优的多选择决策。
Neural Netw. 2011 Jun;24(5):417-26. doi: 10.1016/j.neunet.2011.01.005. Epub 2011 Mar 4.
7
Bayesian spiking neurons II: learning.贝叶斯脉冲神经元II:学习
Neural Comput. 2008 Jan;20(1):118-45. doi: 10.1162/neco.2008.20.1.118.
8
A learning theory for reward-modulated spike-timing-dependent plasticity with application to biofeedback.一种用于奖励调制的依赖于尖峰时间的可塑性的学习理论及其在生物反馈中的应用。
PLoS Comput Biol. 2008 Oct;4(10):e1000180. doi: 10.1371/journal.pcbi.1000180. Epub 2008 Oct 10.
9
Reinforcement learning through modulation of spike-timing-dependent synaptic plasticity.通过调节尖峰时间依赖性突触可塑性进行强化学习。
Neural Comput. 2007 Jun;19(6):1468-502. doi: 10.1162/neco.2007.19.6.1468.
10
Adaptive learning via selectionism and Bayesianism, Part I: connection between the two.基于选择主义和贝叶斯主义的适应性学习,第一部分:两者之间的联系。
Neural Netw. 2009 Apr;22(3):220-8. doi: 10.1016/j.neunet.2009.03.018. Epub 2009 Apr 5.

引用本文的文献

1
Behavioral Analysis of EEG Signals in Loss-Gain Decision-Making Experiments.脑电信号在得失决策实验中的行为分析。
Behav Neurol. 2022 Jul 15;2022:3070608. doi: 10.1155/2022/3070608. eCollection 2022.
2
Introducing principles of synaptic integration in the optimization of deep neural networks.介绍突触整合原理在深度神经网络优化中的应用。
Nat Commun. 2022 Apr 7;13(1):1885. doi: 10.1038/s41467-022-29491-2.
3
Computational mechanisms of distributed value representations and mixed learning strategies.分布式价值表示和混合学习策略的计算机制。
Nat Commun. 2021 Dec 10;12(1):7191. doi: 10.1038/s41467-021-27413-2.
4
Rapid Learning of Odor-Value Association in the Olfactory Striatum.嗅觉纹状体中气味-价值关联的快速学习。
J Neurosci. 2020 May 27;40(22):4335-4347. doi: 10.1523/JNEUROSCI.2604-19.2020. Epub 2020 Apr 22.
5
Phase-Locked Stimulation during Cortical Beta Oscillations Produces Bidirectional Synaptic Plasticity in Awake Monkeys.在清醒猴子的皮层β振荡期间进行锁相刺激会产生双向突触可塑性。
Curr Biol. 2018 Aug 20;28(16):2515-2526.e4. doi: 10.1016/j.cub.2018.07.009. Epub 2018 Aug 9.
6
A Cognitive Model Based on Neuromodulated Plasticity.基于神经调节可塑性的认知模型。
Comput Intell Neurosci. 2016;2016:4296356. doi: 10.1155/2016/4296356. Epub 2016 Oct 30.
7
Developmental self-construction and -configuration of functional neocortical neuronal networks.功能性新皮质神经元网络的发育性自我构建与配置
PLoS Comput Biol. 2014 Dec 4;10(12):e1003994. doi: 10.1371/journal.pcbi.1003994. eCollection 2014 Dec.
8
Social learning in humans and other animals.人类和其他动物的社会学习。
Front Neurosci. 2014 Mar 31;8:58. doi: 10.3389/fnins.2014.00058. eCollection 2014.
9
Bayesian computation emerges in generic cortical microcircuits through spike-timing-dependent plasticity.贝叶斯计算通过依赖于尖峰时间的可塑性出现在一般的皮质微电路中。
PLoS Comput Biol. 2013 Apr;9(4):e1003037. doi: 10.1371/journal.pcbi.1003037. Epub 2013 Apr 25.
10
Rare neural correlations implement robotic conditioning with delayed rewards and disturbances.罕见的神经相关性实现了具有延迟奖励和干扰的机器人条件作用。
Front Neurorobot. 2013 Apr 2;7:6. doi: 10.3389/fnbot.2013.00006. eCollection 2013.