Mysore Nishant, Hota Gopabandhu, Deiss Stephen R, Pedroni Bruno U, Cauwenberghs Gert
Integrated Systems Neuroengineering Laboratory, Department of Bioengineering, University of California, San Diego, La Jolla, CA, United States.
Department of Electrical and Computer Engineering, University of California, San Diego, La Jolla, CA, United States.
Front Neurosci. 2022 Jan 31;15:797654. doi: 10.3389/fnins.2021.797654. eCollection 2021.
We present an efficient and scalable partitioning method for mapping large-scale neural network models with locally dense and globally sparse connectivity onto reconfigurable neuromorphic hardware. Scalability in computational efficiency, i.e., amount of time spent in actual computation, remains a huge challenge in very large networks. Most partitioning algorithms also struggle to address the scalability in network workloads in finding a globally optimal partition and efficiently mapping onto hardware. As communication is regarded as the most energy and time-consuming part of such distributed processing, the partitioning framework is optimized for compute-balanced, memory-efficient parallel processing targeting low-latency execution and dense synaptic storage, with minimal routing across various compute cores. We demonstrate highly scalable and efficient partitioning for connectivity-aware and hierarchical address-event routing resource-optimized mapping, significantly reducing the total communication volume recursively when compared to random balanced assignment. We showcase our results working on synthetic networks with varying degrees of sparsity factor and fan-out, small-world networks, feed-forward networks, and a hemibrain connectome reconstruction of the fruit-fly brain. The combination of our method and practical results suggest a promising path toward extending to very large-scale networks and scalable hardware-aware partitioning.
我们提出了一种高效且可扩展的分区方法,用于将具有局部密集和全局稀疏连接的大规模神经网络模型映射到可重构神经形态硬件上。计算效率的可扩展性,即在实际计算中花费的时间量,在非常大的网络中仍然是一个巨大的挑战。大多数分区算法在寻找全局最优分区并有效地映射到硬件时,也难以解决网络工作负载的可扩展性问题。由于通信被认为是此类分布式处理中最耗能和最耗时的部分,因此分区框架针对计算平衡、内存高效的并行处理进行了优化,目标是低延迟执行和密集突触存储,同时在各个计算核心之间的路由最少。我们展示了针对连接感知和分层地址事件路由资源优化映射的高度可扩展且高效的分区,与随机平衡分配相比,递归地显著减少了总通信量。我们展示了在具有不同稀疏因子和扇出度的合成网络、小世界网络、前馈网络以及果蝇大脑的半脑连接体重建上的工作结果。我们的方法与实际结果相结合,为扩展到超大规模网络和可扩展的硬件感知分区指明了一条充满希望的道路。