Weidel Philipp, Duarte Renato, Morrison Abigail
Institute of Neuroscience and Medicine (INM-6) & Institute for Advanced Simulation (IAS-6) & JARA-Institute Brain Structure-Function Relationship (JBI-1 / INM-10), Research Centre Jülich, Jülich, Germany.
Department of Computer Science 3 - Software Engineering, RWTH Aachen University, Aachen, Germany.
Front Comput Neurosci. 2021 Mar 4;15:543872. doi: 10.3389/fncom.2021.543872. eCollection 2021.
Reinforcement learning is a paradigm that can account for how organisms learn to adapt their behavior in complex environments with sparse rewards. To partition an environment into discrete states, implementations in spiking neuronal networks typically rely on input architectures involving place cells or receptive fields specified by the researcher. This is problematic as a model for how an organism can learn appropriate behavioral sequences in unknown environments, as it fails to account for the unsupervised and self-organized nature of the required representations. Additionally, this approach presupposes knowledge on the part of the researcher on how the environment should be partitioned and represented and scales poorly with the size or complexity of the environment. To address these issues and gain insights into how the brain generates its own task-relevant mappings, we propose a learning architecture that combines unsupervised learning on the input projections with biologically motivated clustered connectivity within the representation layer. This combination allows input features to be mapped to clusters; thus the network self-organizes to produce clearly distinguishable activity patterns that can serve as the basis for reinforcement learning on the output projections. On the basis of the MNIST and Mountain Car tasks, we show that our proposed model performs better than either a comparable unclustered network or a clustered network with static input projections. We conclude that the combination of unsupervised learning and clustered connectivity provides a generic representational substrate suitable for further computation.
强化学习是一种范式,它可以解释生物体如何在奖励稀疏的复杂环境中学习调整其行为。为了将环境划分为离散状态,脉冲神经网络中的实现通常依赖于涉及位置细胞或由研究人员指定的感受野的输入架构。作为生物体如何在未知环境中学习适当行为序列的模型,这存在问题,因为它没有考虑所需表征的无监督和自组织性质。此外,这种方法预先假定研究人员了解环境应如何划分和表征,并且随着环境的大小或复杂性增加,扩展性较差。为了解决这些问题并深入了解大脑如何生成其自身与任务相关的映射,我们提出了一种学习架构,该架构将输入投影上的无监督学习与表征层内具有生物学动机的聚类连接相结合。这种组合允许将输入特征映射到聚类;因此,网络会自组织以产生清晰可区分的活动模式,这些模式可作为输出投影上强化学习的基础。基于MNIST和山地车任务,我们表明我们提出的模型比可比的非聚类网络或具有静态输入投影的聚类网络表现更好。我们得出结论,无监督学习和聚类连接的组合提供了适合进一步计算的通用表征基础。