Centro de Informática - CIn, Universidade Federal de Pernambuco, Av. Jornalista Aníbal Fernandes, s/n, Cidade Universitária, 50.740-560, Brazil.
Neural Netw. 2021 Dec;144:496-506. doi: 10.1016/j.neunet.2021.09.010. Epub 2021 Sep 17.
Spiking neural networks (SNNs) aim to replicate energy efficiency, learning speed and temporal processing of biological brains. However, accuracy and learning speed of such networks is still behind reinforcement learning (RL) models based on traditional neural models. This work combines a pre-trained binary convolutional neural network with an SNN trained online through reward-modulated STDP in order to leverage advantages of both models. The spiking network is an extension of its previous version, with improvements in architecture and dynamics to address a more challenging task. We focus on extensive experimental evaluation of the proposed model with optimized state-of-the-art baselines, namely proximal policy optimization (PPO) and deep Q network (DQN). The models are compared on a grid-world environment with high dimensional observations, consisting of RGB images with up to 256 × 256 pixels. The experimental results show that the proposed architecture can be a competitive alternative to deep reinforcement learning (DRL) in the evaluated environment and provide a foundation for more complex future applications of spiking networks.
尖峰神经网络 (SNN) 旨在复制生物大脑的能量效率、学习速度和时间处理能力。然而,此类网络的准确性和学习速度仍落后于基于传统神经网络模型的强化学习 (RL) 模型。这项工作结合了经过预训练的二进制卷积神经网络和通过奖励调制 STDP 在线训练的 SNN,以利用这两种模型的优势。尖峰网络是其前一个版本的扩展,在架构和动力学方面进行了改进,以解决更具挑战性的任务。我们专注于通过优化的最先进基线(即近端策略优化 (PPO) 和深度 Q 网络 (DQN))对所提出的模型进行广泛的实验评估。在具有高维观测值的网格世界环境中对模型进行比较,观测值由高达 256×256 像素的 RGB 图像组成。实验结果表明,在所评估的环境中,所提出的架构可以作为深度强化学习 (DRL) 的一种有竞争力的替代方案,并为未来更复杂的尖峰网络应用提供基础。