Suppr超能文献

在 Blue Gene/P 超级计算机上比较神经元尖峰交换方法。

Comparison of neuronal spike exchange methods on a Blue Gene/P supercomputer.

机构信息

Department of Computer Science, Yale University New Haven, CT, USA.

出版信息

Front Comput Neurosci. 2011 Nov 18;5:49. doi: 10.3389/fncom.2011.00049. eCollection 2011.

Abstract

For neural network simulations on parallel machines, interprocessor spike communication can be a significant portion of the total simulation time. The performance of several spike exchange methods using a Blue Gene/P (BG/P) supercomputer has been tested with 8-128 K cores using randomly connected networks of up to 32 M cells with 1 k connections per cell and 4 M cells with 10 k connections per cell, i.e., on the order of 4·10(10) connections (K is 1024, M is 1024(2), and k is 1000). The spike exchange methods used are the standard Message Passing Interface (MPI) collective, MPI_Allgather, and several variants of the non-blocking Multisend method either implemented via non-blocking MPI_Isend, or exploiting the possibility of very low overhead direct memory access (DMA) communication available on the BG/P. In all cases, the worst performing method was that using MPI_Isend due to the high overhead of initiating a spike communication. The two best performing methods-the persistent Multisend method using the Record-Replay feature of the Deep Computing Messaging Framework DCMF_Multicast; and a two-phase multisend in which a DCMF_Multicast is used to first send to a subset of phase one destination cores, which then pass it on to their subset of phase two destination cores-had similar performance with very low overhead for the initiation of spike communication. Departure from ideal scaling for the Multisend methods is almost completely due to load imbalance caused by the large variation in number of cells that fire on each processor in the interval between synchronization. Spike exchange time itself is negligible since transmission overlaps with computation and is handled by a DMA controller. We conclude that ideal performance scaling will be ultimately limited by imbalance between incoming processor spikes between synchronization intervals. Thus, counterintuitively, maximization of load balance requires that the distribution of cells on processors should not reflect neural net architecture but be randomly distributed so that sets of cells which are burst firing together should be on different processors with their targets on as large a set of processors as possible.

摘要

对于在并行机器上进行神经网络模拟,处理器间的尖峰通信可能会占据总模拟时间的很大一部分。使用 Blue Gene/P(BG/P)超级计算机测试了几种尖峰交换方法的性能,使用 8-128K 核,使用随机连接的网络,最多可达 32M 个细胞,每个细胞有 1k 个连接,4M 个细胞有 10k 个连接,即大约有 4·10(10)个连接(K 是 1024,M 是 1024(2),k 是 1000)。使用的尖峰交换方法是标准的消息传递接口(MPI)集合MPI_Allgather,以及非阻塞 Multisend 方法的几种变体,这些变体要么通过非阻塞 MPI_Isend 实现,要么利用 BG/P 上可用的极低开销直接内存访问(DMA)通信的可能性。在所有情况下,性能最差的方法是使用 MPI_Isend 的方法,因为发起尖峰通信的开销很高。两种性能最好的方法是使用 Deep Computing Messaging Framework DCMF_Multicast 的记录-重放功能的持久 Multisend 方法;以及两阶段多发送方法,其中使用 DCMF_Multicast 首先将消息发送到第一阶段目标核的一个子集,然后这些核将其转发到第二阶段目标核的一个子集,这两种方法在发起尖峰通信时开销非常低,性能相似。Multisend 方法的非理想扩展几乎完全是由于在同步间隔之间,每个处理器上触发的细胞数量变化很大而导致的负载不平衡造成的。尖峰交换时间本身可以忽略不计,因为传输与计算重叠,并由 DMA 控制器处理。我们得出的结论是,理想的性能扩展最终将受到同步间隔之间传入处理器尖峰之间的不平衡限制。因此,具有反直觉性的是,最大程度地平衡负载要求处理器上的细胞分布不应反映神经网络结构,而是随机分布,以便一起突发发射的细胞集应该位于尽可能多的处理器上,并且它们的目标应该位于尽可能多的处理器上。

https://cdn.ncbi.nlm.nih.gov/pmc/blobs/10c8/3219917/5f1affbe8a94/fncom-05-00049-g0001.jpg

文献AI研究员

20分钟写一篇综述,助力文献阅读效率提升50倍。

立即体验

用中文搜PubMed

大模型驱动的PubMed中文搜索引擎

马上搜索

文档翻译

学术文献翻译模型,支持多种主流文档格式。

立即体验