Akbarzadeh-Sherbaf Kaveh, Abdoli Behrooz, Safari Saeed, Vahabie Abdol-Hossein
High Performance Embedded Architecture Lab., School of Electrical and Computer Engineering, College of Engineering, University of Tehran, Tehran, Iran.
School of Cognitive Sciences, Institute for Research in Fundamental Sciences, Tehran, Iran.
Front Neurosci. 2018 Oct 9;12:698. doi: 10.3389/fnins.2018.00698. eCollection 2018.
Human intelligence relies on the vast number of neurons and their interconnections that form a parallel computing engine. If we tend to design a brain-like machine, we will have no choice but to employ many spiking neurons, each one has a large number of synapses. Such a neuronal network is not only compute-intensive but also memory-intensive. The performance and the configurability of the modern FPGAs make them suitable hardware solutions to deal with these challenges. This paper presents a scalable architecture to simulate a randomly connected network of Hodgkin-Huxley neurons. To demonstrate that our architecture eliminates the need to use a high-end device, we employ the XC7A200T, a member of the mid-range Xilinx Artix®-7 family, as our target device. A set of techniques are proposed to reduce the memory usage and computational requirements. Here we introduce a multi-core architecture in which each core can update the states of a group of neurons stored in its corresponding memory bank. The proposed system uses a novel method to generate the connectivity vectors on the fly instead of storing them in a huge memory. This technique is based on a cyclic permutation of a single prestored connectivity vector per core. Moreover, to reduce both the resource usage and the computational latency even more, a novel approximate two-level counter is introduced to count the number of the spikes at the synapse for the sparse network. The first level is a low cost saturated counter implemented on FPGA lookup tables that reduces the number of inputs to the second level exact adder tree. It, therefore, results in much lower hardware cost for the counter circuit. These techniques along with pipelining make it possible to have a high-performance, scalable architecture, which could be configured for either a real-time simulation of up to 5120 neurons or a large-scale simulation of up to 65536 neurons in an appropriate execution time on a cost-optimized FPGA.
人类智能依赖于大量的神经元及其相互连接,这些神经元和连接构成了一个并行计算引擎。如果我们想要设计一台类似大脑的机器,就别无选择,只能采用许多发放脉冲的神经元,每个神经元都有大量的突触。这样的神经网络不仅计算密集,而且内存密集。现代FPGA的性能和可配置性使其成为应对这些挑战的合适硬件解决方案。本文提出了一种可扩展的架构,用于模拟霍奇金-赫胥黎神经元的随机连接网络。为了证明我们的架构无需使用高端设备,我们采用了XC7A200T,它是赛灵思中档Artix®-7系列的一员,作为我们的目标设备。我们提出了一组技术来减少内存使用和计算需求。这里我们介绍一种多核架构,其中每个核心可以更新存储在其相应内存库中的一组神经元的状态。所提出的系统使用一种新颖的方法动态生成连接向量,而不是将它们存储在巨大的内存中。该技术基于每个核心预先存储的单个连接向量的循环置换。此外,为了进一步减少资源使用和计算延迟,引入了一种新颖的近似二级计数器,用于对稀疏网络中突触处的脉冲数量进行计数。第一级是在FPGA查找表上实现的低成本饱和计数器,它减少了输入到第二级精确加法树的数量。因此,它使得计数器电路的硬件成本大大降低。这些技术与流水线技术相结合,使得拥有高性能、可扩展的架构成为可能,该架构可以在成本优化的FPGA上以适当的执行时间配置为对多达5120个神经元进行实时模拟或对多达65536个神经元进行大规模模拟。