Suppr超能文献

奖励调节的决策赫布学习。

Reward-modulated Hebbian learning of decision making.

机构信息

Institute for Theoretical Computer Science, Graz University of Technology, A-8010 Graz, Austria.

出版信息

Neural Comput. 2010 Jun;22(6):1399-444. doi: 10.1162/neco.2010.03-09-980.

Abstract

We introduce a framework for decision making in which the learning of decision making is reduced to its simplest and biologically most plausible form: Hebbian learning on a linear neuron. We cast our Bayesian-Hebb learning rule as reinforcement learning in which certain decisions are rewarded and prove that each synaptic weight will on average converge exponentially fast to the log-odd of receiving a reward when its pre- and postsynaptic neurons are active. In our simple architecture, a particular action is selected from the set of candidate actions by a winner-take-all operation. The global reward assigned to this action then modulates the update of each synapse. Apart from this global reward signal, our reward-modulated Bayesian Hebb rule is a pure Hebb update that depends only on the coactivation of the pre- and postsynaptic neurons, not on the weighted sum of all presynaptic inputs to the postsynaptic neuron as in the perceptron learning rule or the Rescorla-Wagner rule. This simple approach to action-selection learning requires that information about sensory inputs be presented to the Bayesian decision stage in a suitably preprocessed form resulting from other adaptive processes (acting on a larger timescale) that detect salient dependencies among input features. Hence our proposed framework for fast learning of decisions also provides interesting new hypotheses regarding neural nodes and computational goals of cortical areas that provide input to the final decision stage.

摘要

我们引入了一种决策框架,其中决策的学习被简化为最简单和最具生物学合理性的形式:线性神经元上的赫布学习。我们将贝叶斯-赫布学习规则表述为强化学习,其中某些决策会得到奖励,并证明当其前突触和后突触神经元活跃时,每个突触权重的平均值将以指数速度快速收敛到接收奖励的对数奇数。在我们的简单架构中,通过胜者全拿操作从候选动作集中选择特定的动作。然后,分配给此动作的全局奖励会调节每个突触的更新。除了这个全局奖励信号之外,我们的奖励调节贝叶斯赫布规则是一个纯粹的赫布更新,仅取决于前突触和后突触神经元的共同激活,而不像感知机学习规则或雷斯卡瓦规则那样取决于后突触神经元所有前突触输入的加权和。这种简单的动作选择学习方法要求将有关感官输入的信息以适当预处理的形式呈现给贝叶斯决策阶段,这种预处理是由其他自适应过程(在更大的时间尺度上运行)完成的,这些过程检测输入特征之间的显著依赖性。因此,我们提出的快速决策学习框架也为提供输入到最终决策阶段的皮质区域的神经节点和计算目标提供了有趣的新假设。

文献AI研究员

20分钟写一篇综述,助力文献阅读效率提升50倍。

立即体验

用中文搜PubMed

大模型驱动的PubMed中文搜索引擎

马上搜索

文档翻译

学术文献翻译模型,支持多种主流文档格式。

立即体验