Akl Mahmoud, Ergene Deniz, Walter Florian, Knoll Alois
Chair of Robotics, Artificial Intelligence and Embedded Systems, TUM School of Computation, Information and Technology, Technische Universität München, Munich, Germany.
Front Neurorobot. 2023 Jan 20;16:1075647. doi: 10.3389/fnbot.2022.1075647. eCollection 2022.
Deep reinforcement learning (DRL) combines reinforcement learning algorithms with deep neural networks (DNNs). Spiking neural networks (SNNs) have been shown to be a biologically plausible and energy efficient alternative to DNNs. Since the introduction of surrogate gradient approaches that allowed to overcome the discontinuity in the spike function, SNNs can now be trained with the backpropagation through time (BPTT) algorithm. While largely explored on supervised learning problems, little work has been done on investigating the use of SNNs as function approximators in DRL. Here we show how SNNs can be applied to different DRL algorithms like Deep Q-Network (DQN) and Twin-Delayed Deep Deteministic Policy Gradient (TD3) for discrete and continuous action space environments, respectively. We found that SNNs are sensitive to the additional hyperparameters introduced by spiking neuron models like current and voltage decay factors, firing thresholds, and that extensive hyperparameter tuning is inevitable. However, we show that increasing the simulation time of SNNs, as well as applying a two-neuron encoding to the input observations helps reduce the sensitivity to the membrane parameters. Furthermore, we show that randomizing the membrane parameters, instead of selecting uniform values for all neurons, has stabilizing effects on the training. We conclude that SNNs can be utilized for learning complex continuous control problems with state-of-the-art DRL algorithms. While the training complexity increases, the resulting SNNs can be directly executed on neuromorphic processors and potentially benefit from their high energy efficiency.
深度强化学习(DRL)将强化学习算法与深度神经网络(DNN)相结合。脉冲神经网络(SNN)已被证明是一种在生物学上合理且节能的DNN替代方案。自从引入允许克服脉冲函数不连续性的替代梯度方法以来,现在可以使用通过时间反向传播(BPTT)算法来训练SNN。虽然在监督学习问题上已经进行了大量探索,但在研究将SNN用作DRL中的函数逼近器方面所做的工作很少。在这里,我们展示了SNN如何分别应用于不同的DRL算法,如深度Q网络(DQN)和双延迟深度确定性策略梯度(TD3),用于离散和连续动作空间环境。我们发现SNN对脉冲神经元模型引入的额外超参数(如电流和电压衰减因子、激发阈值)很敏感,并且不可避免地需要进行广泛的超参数调整。然而,我们表明增加SNN的模拟时间,以及对输入观测值应用双神经元编码有助于降低对膜参数的敏感性。此外,我们表明随机化膜参数,而不是为所有神经元选择统一的值,对训练有稳定作用。我们得出结论,SNN可用于通过先进的DRL算法学习复杂的连续控制问题。虽然训练复杂度增加,但所得的SNN可以直接在神经形态处理器上执行,并可能受益于其高能效。