Institute for Theoretical Computer Science, Graz University of Technology, Graz, Austria.
PLoS Comput Biol. 2010 Aug 19;6(8):e1000894. doi: 10.1371/journal.pcbi.1000894.
Humans and animals are able to learn complex behaviors based on a massive stream of sensory information from different modalities. Early animal studies have identified learning mechanisms that are based on reward and punishment such that animals tend to avoid actions that lead to punishment whereas rewarded actions are reinforced. However, most algorithms for reward-based learning are only applicable if the dimensionality of the state-space is sufficiently small or its structure is sufficiently simple. Therefore, the question arises how the problem of learning on high-dimensional data is solved in the brain. In this article, we propose a biologically plausible generic two-stage learning system that can directly be applied to raw high-dimensional input streams. The system is composed of a hierarchical slow feature analysis (SFA) network for preprocessing and a simple neural network on top that is trained based on rewards. We demonstrate by computer simulations that this generic architecture is able to learn quite demanding reinforcement learning tasks on high-dimensional visual input streams in a time that is comparable to the time needed when an explicit highly informative low-dimensional state-space representation is given instead of the high-dimensional visual input. The learning speed of the proposed architecture in a task similar to the Morris water maze task is comparable to that found in experimental studies with rats. This study thus supports the hypothesis that slowness learning is one important unsupervised learning principle utilized in the brain to form efficient state representations for behavioral learning.
人类和动物能够基于来自不同模态的大量感觉信息学习复杂的行为。早期的动物研究已经确定了基于奖励和惩罚的学习机制,使得动物倾向于避免导致惩罚的行为,而奖励的行为则得到加强。然而,大多数基于奖励的学习算法仅适用于状态空间的维数足够小或其结构足够简单的情况。因此,问题是在大脑中如何解决高维数据的学习问题。在本文中,我们提出了一种具有生物学意义的通用两阶段学习系统,可以直接应用于原始的高维输入流。该系统由一个分层慢特征分析(SFA)网络组成,用于预处理,以及一个简单的神经网络,该网络基于奖励进行训练。通过计算机模拟,我们证明了这种通用架构能够在高维视觉输入流上学习相当苛刻的强化学习任务,所需的时间与提供高维视觉输入而不是显式的高信息量低维状态空间表示时所需的时间相当。在类似于 Morris 水迷宫任务的任务中,所提出的架构的学习速度与在大鼠实验研究中发现的速度相当。因此,这项研究支持了慢学习是大脑中用于形成行为学习的有效状态表示的一种重要无监督学习原则的假说。