Haşegan Daniel, Deible Matt, Earl Christopher, D'Onofrio David, Hazan Hananel, Anwar Haroon, Neymotin Samuel A
Vilcek Institute of Graduate Biomedical Sciences, NYU Grossman School of Medicine, New York, NY, United States.
Department of Computer Science, University of Pittsburgh, Pittsburgh, PA, United States.
Front Comput Neurosci. 2022 Sep 30;16:1017284. doi: 10.3389/fncom.2022.1017284. eCollection 2022.
Artificial neural networks (ANNs) have been successfully trained to perform a wide range of sensory-motor behaviors. In contrast, the performance of spiking neuronal network (SNN) models trained to perform similar behaviors remains relatively suboptimal. In this work, we aimed to push the field of SNNs forward by exploring the potential of different learning mechanisms to achieve optimal performance. We trained SNNs to solve the CartPole reinforcement learning (RL) control problem using two learning mechanisms operating at different timescales: (1) spike-timing-dependent reinforcement learning (STDP-RL) and (2) evolutionary strategy (EVOL). Though the role of STDP-RL in biological systems is well established, several other mechanisms, though not fully understood, work in concert during learning . Recreating accurate models that capture the interaction of STDP-RL with these diverse learning mechanisms is extremely difficult. EVOL is an alternative method and has been successfully used in many studies to fit model neural responsiveness to electrophysiological recordings and, in some cases, for classification problems. One advantage of EVOL is that it may not need to capture all interacting components of synaptic plasticity and thus provides a better alternative to STDP-RL. Here, we compared the performance of each algorithm after training, which revealed EVOL as a powerful method for training SNNs to perform sensory-motor behaviors. Our modeling opens up new capabilities for SNNs in RL and could serve as a testbed for neurobiologists aiming to understand multi-timescale learning mechanisms and dynamics in neuronal circuits.
人工神经网络(ANNs)已被成功训练以执行广泛的感觉运动行为。相比之下,训练来执行类似行为的脉冲神经网络(SNN)模型的性能仍然相对欠佳。在这项工作中,我们旨在通过探索不同学习机制的潜力以实现最佳性能,来推动SNN领域的发展。我们使用在不同时间尺度上运行的两种学习机制,训练SNN来解决CartPole强化学习(RL)控制问题:(1)依赖于脉冲时间的强化学习(STDP-RL)和(2)进化策略(EVOL)。尽管STDP-RL在生物系统中的作用已得到充分证实,但在学习过程中,其他几种机制虽然尚未完全理解,但却是协同起作用的。重建能够捕捉STDP-RL与这些不同学习机制相互作用的精确模型极其困难。EVOL是一种替代方法,已在许多研究中成功用于使模型神经反应与电生理记录相匹配,并且在某些情况下用于分类问题。EVOL的一个优点是它可能不需要捕捉突触可塑性的所有相互作用成分,因此为STDP-RL提供了一个更好的替代方案。在这里,我们比较了训练后每种算法的性能,结果表明EVOL是训练SNN执行感觉运动行为的一种强大方法。我们的建模为SNN在RL中的应用开辟了新的能力,并可为旨在理解神经元回路中多时间尺度学习机制和动态的神经生物学家提供一个试验平台。