高维输入流的慢特征的强化学习。

Reinforcement learning on slow features of high-dimensional input streams.

机构信息

Institute for Theoretical Computer Science, Graz University of Technology, Graz, Austria.

出版信息

PLoS Comput Biol. 2010 Aug 19;6(8):e1000894. doi: 10.1371/journal.pcbi.1000894.

DOI:10.1371/journal.pcbi.1000894

PMID:20808883

原文链接:https://pmc.ncbi.nlm.nih.gov/articles/PMC2924248/

Abstract

Humans and animals are able to learn complex behaviors based on a massive stream of sensory information from different modalities. Early animal studies have identified learning mechanisms that are based on reward and punishment such that animals tend to avoid actions that lead to punishment whereas rewarded actions are reinforced. However, most algorithms for reward-based learning are only applicable if the dimensionality of the state-space is sufficiently small or its structure is sufficiently simple. Therefore, the question arises how the problem of learning on high-dimensional data is solved in the brain. In this article, we propose a biologically plausible generic two-stage learning system that can directly be applied to raw high-dimensional input streams. The system is composed of a hierarchical slow feature analysis (SFA) network for preprocessing and a simple neural network on top that is trained based on rewards. We demonstrate by computer simulations that this generic architecture is able to learn quite demanding reinforcement learning tasks on high-dimensional visual input streams in a time that is comparable to the time needed when an explicit highly informative low-dimensional state-space representation is given instead of the high-dimensional visual input. The learning speed of the proposed architecture in a task similar to the Morris water maze task is comparable to that found in experimental studies with rats. This study thus supports the hypothesis that slowness learning is one important unsupervised learning principle utilized in the brain to form efficient state representations for behavioral learning.

摘要

人类和动物能够基于来自不同模态的大量感觉信息学习复杂的行为。早期的动物研究已经确定了基于奖励和惩罚的学习机制，使得动物倾向于避免导致惩罚的行为，而奖励的行为则得到加强。然而，大多数基于奖励的学习算法仅适用于状态空间的维数足够小或其结构足够简单的情况。因此，问题是在大脑中如何解决高维数据的学习问题。在本文中，我们提出了一种具有生物学意义的通用两阶段学习系统，可以直接应用于原始的高维输入流。该系统由一个分层慢特征分析（SFA）网络组成，用于预处理，以及一个简单的神经网络，该网络基于奖励进行训练。通过计算机模拟，我们证明了这种通用架构能够在高维视觉输入流上学习相当苛刻的强化学习任务，所需的时间与提供高维视觉输入而不是显式的高信息量低维状态空间表示时所需的时间相当。在类似于 Morris 水迷宫任务的任务中，所提出的架构的学习速度与在大鼠实验研究中发现的速度相当。因此，这项研究支持了慢学习是大脑中用于形成行为学习的有效状态表示的一种重要无监督学习原则的假说。

https://cdn.ncbi.nlm.nih.gov/pmc/blobs/6b14/2924248/f9d82ec3f822/pcbi.1000894.g001.jpg

相似文献

Reinforcement learning on slow features of high-dimensional input streams.

PLoS Comput Biol. 2010 Aug 19;6(8):e1000894. doi: 10.1371/journal.pcbi.1000894.

Recurrent neural networks that learn multi-step visual routines with reinforcement learning.

PLoS Comput Biol. 2024 Apr 29;20(4):e1012030. doi: 10.1371/journal.pcbi.1012030. eCollection 2024 Apr.

Reinforcement learning using a continuous time actor-critic framework with spiking neurons.

PLoS Comput Biol. 2013 Apr;9(4):e1003024. doi: 10.1371/journal.pcbi.1003024. Epub 2013 Apr 11.

Modular deep reinforcement learning from reward and punishment for robot navigation.

Neural Netw. 2021 Mar;135:115-126. doi: 10.1016/j.neunet.2020.12.001. Epub 2020 Dec 8.

Reinforcement learning can account for associative and perceptual learning on a visual-decision task.

Nat Neurosci. 2009 May;12(5):655-63. doi: 10.1038/nn.2304. Epub 2009 Apr 19.

Task Learning Over Multi-Day Recording via Internally Rewarded Reinforcement Learning Based Brain Machine Interfaces.

IEEE Trans Neural Syst Rehabil Eng. 2020 Dec;28(12):3089-3099. doi: 10.1109/TNSRE.2020.3039970. Epub 2021 Jan 28.

Learning latent structure: carving nature at its joints.

Curr Opin Neurobiol. 2010 Apr;20(2):251-6. doi: 10.1016/j.conb.2010.02.008. Epub 2010 Mar 11.

Human-level control through deep reinforcement learning.

Nature. 2015 Feb 26;518(7540):529-33. doi: 10.1038/nature14236.

Human reinforcement learning subdivides structured action spaces by learning effector-specific values.

J Neurosci. 2009 Oct 28;29(43):13524-31. doi: 10.1523/JNEUROSCI.2469-09.2009.

Neural learning rules for generating flexible predictions and computing the successor representation.

Elife. 2023 Mar 16;12:e80680. doi: 10.7554/eLife.80680.

引用本文的文献

An inductive bias for slowly changing features in human reinforcement learning.

PLoS Comput Biol. 2024 Nov 25;20(11):e1012568. doi: 10.1371/journal.pcbi.1012568. eCollection 2024 Nov.

Slow feature analysis on retinal waves leads to V1 complex cells.

PLoS Comput Biol. 2014 May 8;10(5):e1003564. doi: 10.1371/journal.pcbi.1003564. eCollection 2014 May.

Neuronal learning of invariant object representation in the ventral visual stream is not dependent on reward.

J Neurosci. 2012 May 9;32(19):6611-20. doi: 10.1523/JNEUROSCI.3786-11.2012.

View-invariance learning in object recognition by pigeons depends on error-driven associative learning processes.

Vision Res. 2012 Jun 1;62:148-61. doi: 10.1016/j.visres.2012.04.004. Epub 2012 Apr 17.

Democratic population decisions result in robust policy-gradient learning: a parametric study with GPU simulations.

PLoS One. 2011 May 4;6(5):e18539. doi: 10.1371/journal.pone.0018539.

本文引用的文献

Removing Time Variation with the Anti-Hebbian Differential Synapse.

Neural Comput. 1991 Fall;3(3):312-320. doi: 10.1162/neco.1991.3.3.312.

Learning Invariance from Transformation Sequences.

Neural Comput. 1991 Summer;3(2):194-200. doi: 10.1162/neco.1991.3.2.194.

Invariant object recognition and pose estimation with slow feature analysis.

Neural Comput. 2011 Sep;23(9):2289-323. doi: 10.1162/NECO_a_00171. Epub 2011 Jun 14.

Spike-based reinforcement learning in continuous state and action space: when policy gradient methods fail.

PLoS Comput Biol. 2009 Dec;5(12):e1000586. doi: 10.1371/journal.pcbi.1000586. Epub 2009 Dec 4.

Reinforcement learning in populations of spiking neurons.

Nat Neurosci. 2009 Mar;12(3):250-2. doi: 10.1038/nn.2264. Epub 2009 Feb 15.

A spiking neural network model of an actor-critic learning agent.

Neural Comput. 2009 Feb;21(2):301-39. doi: 10.1162/neco.2008.08-07-593.

Modular Toolkit for Data Processing (MDP): A Python Data Processing Framework.

Front Neuroinform. 2009 Jan 8;2:8. doi: 10.3389/neuro.11.008.2008. eCollection 2008.

Spiking neurons can learn to solve information bottleneck problems and extract independent components.

Neural Comput. 2009 Apr;21(4):911-59. doi: 10.1162/neco.2008.01-07-432.

A learning theory for reward-modulated spike-timing-dependent plasticity with application to biofeedback.

PLoS Comput Biol. 2008 Oct;4(10):e1000180. doi: 10.1371/journal.pcbi.1000180. Epub 2008 Oct 10.

Unsupervised natural experience rapidly alters invariant object representation in visual cortex.

Science. 2008 Sep 12;321(5895):1502-7. doi: 10.1126/science.1160028.

文献AI研究员

20分钟写一篇综述，助力文献阅读效率提升50倍。

立即体验

用中文搜PubMed

大模型驱动的PubMed中文搜索引擎

马上搜索

文档翻译

学术文献翻译模型，支持多种主流文档格式。

立即体验

Suppr
超能文献

高维输入流的慢特征的强化学习。

Reinforcement learning on slow features of high-dimensional input streams.

机构信息

出版信息

相似文献

引用本文的文献

本文引用的文献

文献AI研究员

用中文搜PubMed

文档翻译

Suppr 超能文献

Suppr超能文献

高维输入流的慢特征的强化学习。

Reinforcement learning on slow features of high-dimensional input streams.

机构信息

出版信息

相似文献

引用本文的文献

本文引用的文献

文献AI研究员

用中文搜PubMed

文档翻译

Suppr 超能文献

Suppr
超能文献