奖励调节的决策赫布学习。

Reward-modulated Hebbian learning of decision making.

机构信息

Institute for Theoretical Computer Science, Graz University of Technology, A-8010 Graz, Austria.

出版信息

Neural Comput. 2010 Jun;22(6):1399-444. doi: 10.1162/neco.2010.03-09-980.

DOI:10.1162/neco.2010.03-09-980

PMID:20141476

Abstract

We introduce a framework for decision making in which the learning of decision making is reduced to its simplest and biologically most plausible form: Hebbian learning on a linear neuron. We cast our Bayesian-Hebb learning rule as reinforcement learning in which certain decisions are rewarded and prove that each synaptic weight will on average converge exponentially fast to the log-odd of receiving a reward when its pre- and postsynaptic neurons are active. In our simple architecture, a particular action is selected from the set of candidate actions by a winner-take-all operation. The global reward assigned to this action then modulates the update of each synapse. Apart from this global reward signal, our reward-modulated Bayesian Hebb rule is a pure Hebb update that depends only on the coactivation of the pre- and postsynaptic neurons, not on the weighted sum of all presynaptic inputs to the postsynaptic neuron as in the perceptron learning rule or the Rescorla-Wagner rule. This simple approach to action-selection learning requires that information about sensory inputs be presented to the Bayesian decision stage in a suitably preprocessed form resulting from other adaptive processes (acting on a larger timescale) that detect salient dependencies among input features. Hence our proposed framework for fast learning of decisions also provides interesting new hypotheses regarding neural nodes and computational goals of cortical areas that provide input to the final decision stage.

摘要

我们引入了一种决策框架，其中决策的学习被简化为最简单和最具生物学合理性的形式：线性神经元上的赫布学习。我们将贝叶斯-赫布学习规则表述为强化学习，其中某些决策会得到奖励，并证明当其前突触和后突触神经元活跃时，每个突触权重的平均值将以指数速度快速收敛到接收奖励的对数奇数。在我们的简单架构中，通过胜者全拿操作从候选动作集中选择特定的动作。然后，分配给此动作的全局奖励会调节每个突触的更新。除了这个全局奖励信号之外，我们的奖励调节贝叶斯赫布规则是一个纯粹的赫布更新，仅取决于前突触和后突触神经元的共同激活，而不像感知机学习规则或雷斯卡瓦规则那样取决于后突触神经元所有前突触输入的加权和。这种简单的动作选择学习方法要求将有关感官输入的信息以适当预处理的形式呈现给贝叶斯决策阶段，这种预处理是由其他自适应过程（在更大的时间尺度上运行）完成的，这些过程检测输入特征之间的显著依赖性。因此，我们提出的快速决策学习框架也为提供输入到最终决策阶段的皮质区域的神经节点和计算目标提供了有趣的新假设。

相似文献

Reward-modulated Hebbian learning of decision making.

Neural Comput. 2010 Jun;22(6):1399-444. doi: 10.1162/neco.2010.03-09-980.

Learning spike-based population codes by reward and population feedback.

Neural Comput. 2010 Jul;22(7):1698-717. doi: 10.1162/neco.2010.05-09-1010.

Supervised learning in spiking neural networks with ReSuMe: sequence learning, classification, and spike shifting.

Neural Comput. 2010 Feb;22(2):467-510. doi: 10.1162/neco.2009.11-08-901.

Reward-dependent learning in neuronal networks for planning and decision making.

Prog Brain Res. 2000;126:217-29. doi: 10.1016/S0079-6123(00)26016-0.

Statistical mechanics of reward-modulated learning in decision-making networks.

Neural Comput. 2012 May;24(5):1230-70. doi: 10.1162/NECO_a_00264. Epub 2012 Feb 1.

Hebbian learning in linear-nonlinear networks with tuning curves leads to near-optimal, multi-alternative decision making.

Neural Netw. 2011 Jun;24(5):417-26. doi: 10.1016/j.neunet.2011.01.005. Epub 2011 Mar 4.

Bayesian spiking neurons II: learning.

Neural Comput. 2008 Jan;20(1):118-45. doi: 10.1162/neco.2008.20.1.118.

A learning theory for reward-modulated spike-timing-dependent plasticity with application to biofeedback.

PLoS Comput Biol. 2008 Oct;4(10):e1000180. doi: 10.1371/journal.pcbi.1000180. Epub 2008 Oct 10.

Reinforcement learning through modulation of spike-timing-dependent synaptic plasticity.

Neural Comput. 2007 Jun;19(6):1468-502. doi: 10.1162/neco.2007.19.6.1468.

Adaptive learning via selectionism and Bayesianism, Part I: connection between the two.

Neural Netw. 2009 Apr;22(3):220-8. doi: 10.1016/j.neunet.2009.03.018. Epub 2009 Apr 5.

引用本文的文献

Behavioral Analysis of EEG Signals in Loss-Gain Decision-Making Experiments.

Behav Neurol. 2022 Jul 15;2022:3070608. doi: 10.1155/2022/3070608. eCollection 2022.

Introducing principles of synaptic integration in the optimization of deep neural networks.

Nat Commun. 2022 Apr 7;13(1):1885. doi: 10.1038/s41467-022-29491-2.

Computational mechanisms of distributed value representations and mixed learning strategies.

Nat Commun. 2021 Dec 10;12(1):7191. doi: 10.1038/s41467-021-27413-2.

Rapid Learning of Odor-Value Association in the Olfactory Striatum.

J Neurosci. 2020 May 27;40(22):4335-4347. doi: 10.1523/JNEUROSCI.2604-19.2020. Epub 2020 Apr 22.

Phase-Locked Stimulation during Cortical Beta Oscillations Produces Bidirectional Synaptic Plasticity in Awake Monkeys.

Curr Biol. 2018 Aug 20;28(16):2515-2526.e4. doi: 10.1016/j.cub.2018.07.009. Epub 2018 Aug 9.

A Cognitive Model Based on Neuromodulated Plasticity.

Comput Intell Neurosci. 2016;2016:4296356. doi: 10.1155/2016/4296356. Epub 2016 Oct 30.

Developmental self-construction and -configuration of functional neocortical neuronal networks.

PLoS Comput Biol. 2014 Dec 4;10(12):e1003994. doi: 10.1371/journal.pcbi.1003994. eCollection 2014 Dec.

Social learning in humans and other animals.

Front Neurosci. 2014 Mar 31;8:58. doi: 10.3389/fnins.2014.00058. eCollection 2014.

Bayesian computation emerges in generic cortical microcircuits through spike-timing-dependent plasticity.

PLoS Comput Biol. 2013 Apr;9(4):e1003037. doi: 10.1371/journal.pcbi.1003037. Epub 2013 Apr 25.

Rare neural correlations implement robotic conditioning with delayed rewards and disturbances.

Front Neurorobot. 2013 Apr 2;7:6. doi: 10.3389/fnbot.2013.00006. eCollection 2013.

文献AI研究员

20分钟写一篇综述，助力文献阅读效率提升50倍。

立即体验

用中文搜PubMed

大模型驱动的PubMed中文搜索引擎

马上搜索

文档翻译

学术文献翻译模型，支持多种主流文档格式。

立即体验

奖励调节的决策赫布学习。

Reward-modulated Hebbian learning of decision making.

机构信息

出版信息

相似文献

引用本文的文献

文献AI研究员

用中文搜PubMed

文档翻译

Suppr 超能文献

相似文献

引用本文的文献