特征和前向模型的目标导向学习。

Goal-directed learning of features and forward models.

作者信息

Saeb Sohrab, Weber Cornelius, Triesch Jochen

机构信息

Frankfurt Institute for Advanced Studies, Goethe University, Frankfurt am Main, Germany.

出版信息

Neural Netw. 2009 Jul-Aug;22(5-6):586-92. doi: 10.1016/j.neunet.2009.06.049. Epub 2009 Jul 8.

DOI:10.1016/j.neunet.2009.06.049

PMID:19616917

Abstract

The brain is able to perform actions based on an adequate internal representation of the world, where task-irrelevant features are ignored and incomplete sensory data are estimated. Traditionally, it is assumed that such abstract state representations are obtained purely from the statistics of sensory input for example by unsupervised learning methods. However, more recent findings suggest an influence of the dopaminergic system, which can be modeled by a reinforcement learning approach. Standard reinforcement learning algorithms act on a single layer network connecting the state space to the action space. Here, we involve in a feature detection stage and a memory layer, which together, construct the state space for a learning agent. The memory layer consists of the state activation at the previous time step as well as the previously chosen action. We present a temporal difference based learning rule for training the weights from these additional inputs to the state layer. As a result, the performance of the network is maintained both, in the presence of task-irrelevant features, and at randomly occurring time steps during which the input is invisible. Interestingly, a goal-directed forward model emerges from the memory weights, which only covers the state-action pairs that are relevant to the task. The model presents a link between reinforcement learning, feature detection and forward models and may help to explain how reward systems recruit cortical circuits for goal-directed feature detection and prediction.

摘要

大脑能够基于对世界的适当内部表征来执行动作，其中与任务无关的特征被忽略，不完整的感官数据也能得到估计。传统上，人们认为这种抽象的状态表征纯粹是从感官输入的统计数据中获得的，例如通过无监督学习方法。然而，最近的研究结果表明多巴胺能系统具有影响，这可以通过强化学习方法进行建模。标准的强化学习算法作用于连接状态空间和动作空间的单层网络。在这里，我们引入了一个特征检测阶段和一个记忆层，它们共同为学习智能体构建状态空间。记忆层由上一个时间步的状态激活以及之前选择的动作组成。我们提出了一种基于时间差分的学习规则，用于训练从这些额外输入到状态层的权重。结果，在存在与任务无关的特征时，以及在输入不可见的随机出现的时间步中，网络的性能都能得以维持。有趣的是，一个目标导向的前向模型从记忆权重中出现，它只涵盖与任务相关的状态-动作对。该模型展示了强化学习、特征检测和前向模型之间的联系，可能有助于解释奖励系统如何招募皮层回路进行目标导向的特征检测和预测。

相似文献

Goal-directed learning of features and forward models.

Neural Netw. 2009 Jul-Aug;22(5-6):586-92. doi: 10.1016/j.neunet.2009.06.049. Epub 2009 Jul 8.

Dynamical model of salience gated working memory, action selection and reinforcement based on basal ganglia and dopamine feedback.

Neural Netw. 2008 Mar-Apr;21(2-3):322-30. doi: 10.1016/j.neunet.2007.12.040. Epub 2007 Dec 31.

A spiking neural network model of an actor-critic learning agent.

Neural Comput. 2009 Feb;21(2):301-39. doi: 10.1162/neco.2008.08-07-593.

Network capacity analysis for latent attractor computation.

Network. 2003 May;14(2):273-302.

An integrate-and-fire model of prefrontal cortex neuronal activity during performance of goal-directed decision making.

Cereb Cortex. 2005 Dec;15(12):1964-81. doi: 10.1093/cercor/bhi072. Epub 2005 Apr 27.

A maze learning comparison of Elman, long short-term memory, and Mona neural networks.

Neural Netw. 2010 Mar;23(2):306-13. doi: 10.1016/j.neunet.2009.11.002. Epub 2009 Nov 18.

A computational neural model of goal-directed utterance selection.

Neural Netw. 2010 Jun;23(5):592-606. doi: 10.1016/j.neunet.2010.01.003. Epub 2010 Jan 13.

A learning rule for very simple universal approximators consisting of a single layer of perceptrons.

Neural Netw. 2008 Jun;21(5):786-95. doi: 10.1016/j.neunet.2007.12.036. Epub 2007 Dec 31.

A new bidirectional heteroassociative memory encompassing correlational, competitive and topological properties.

Neural Netw. 2009 Jul-Aug;22(5-6):568-78. doi: 10.1016/j.neunet.2009.06.011. Epub 2009 Jun 30.

Reliability of internal prediction/estimation and its application. I. Adaptive action selection reflecting reliability of value function.

Neural Netw. 2004 Sep;17(7):935-52. doi: 10.1016/j.neunet.2004.05.004.

引用本文的文献

A spiking neural network model of model-free reinforcement learning with high-dimensional sensory input and perceptual ambiguity.

PLoS One. 2015 Mar 3;10(3):e0115620. doi: 10.1371/journal.pone.0115620. eCollection 2015.

文献AI研究员

20分钟写一篇综述，助力文献阅读效率提升50倍。

立即体验

用中文搜PubMed

大模型驱动的PubMed中文搜索引擎

马上搜索

文档翻译

学术文献翻译模型，支持多种主流文档格式。

立即体验

特征和前向模型的目标导向学习。

Goal-directed learning of features and forward models.

作者信息

机构信息

出版信息

相似文献

引用本文的文献

文献AI研究员

用中文搜PubMed

文档翻译

Suppr 超能文献

相似文献

引用本文的文献