Artiles Oswaldo, Saeed Fahad
School of Computing and Information Sciences, Florida, International University, Miami, Florida, USA.
Proc Int Workshops Parallel Proc. 2021 Aug;2021. doi: 10.1145/3458744.3474047. Epub 2021 Sep 23.
Betweenness centrality (BC) is a shortest path centrality metric used to measure the influence of individual vertices or edges on huge graphs that are used for modeling and analysis of human brain, omics data, or social networks. The application of the BC algorithm to modern graphs must deal with the size of the graphs, as well with highly irregular data-access patterns. These challenges are particularly important when the BC algorithm is implemented on Graphics Processing Units (GPU), due to the limited global memory of these processors, as well as the decrease in performance due to the load unbalance resulting from processing irregular data structures. In this paper, we present the first GPU based linear-algebraic formulation and implementation of BC, called TurboBC, a set of memory efficient BC algorithms that exhibits good performance and high scalability on unweighted, undirected or directed sparse graphs of arbitrary structure. Our experiments demonstrate that our TurboBC algorithms obtain more than 18 GTEPs and an average speedup of 31.9x over the sequential version of the BC algorithm, and are on average 1.7x and 2.2x faster than the state-of-the-art algorithms implemented on the high performance, GPU-based, gunrock, and CPU-based, ligra libraries, respectively. These experiments also show that by minimizing their memory footprint, the TurboBC algorithms are able to compute the BC of relatively big graphs, for which the gunrock algorithms ran out of memory.
介数中心性(BC)是一种最短路径中心性度量,用于衡量单个顶点或边对用于人类大脑建模与分析、组学数据或社交网络的大型图的影响。将BC算法应用于现代图时,必须处理图的规模以及高度不规则的数据访问模式。当在图形处理单元(GPU)上实现BC算法时,这些挑战尤为重要,因为这些处理器的全局内存有限,而且处理不规则数据结构会导致负载不平衡,进而使性能下降。在本文中,我们提出了首个基于GPU的BC线性代数公式和实现,称为TurboBC,这是一组内存高效的BC算法,在任意结构的无加权、无向或有向稀疏图上表现出良好的性能和高可扩展性。我们的实验表明,我们的TurboBC算法获得了超过18 GTEP的性能,比BC算法的顺序版本平均加速31.9倍,并且分别比基于高性能GPU的gunrock库和基于CPU的ligra库实现的现有算法平均快1.7倍和2.2倍。这些实验还表明,通过最小化内存占用,TurboBC算法能够计算相对较大的图的BC,而gunrock算法在处理这些图时会出现内存不足的情况。