Chi Yuze, Guo Licheng, Cong Jason
University of California, Los Angeles.
FPGA. 2022 Feb;2022:190-200. doi: 10.1145/3490422.3502358. Epub 2022 Feb 11.
The single-source shortest path (SSSP) problem is one of the most important and well-studied graph problems widely used in many application domains, such as road navigation, neural image reconstruction, and social network analysis. Although we have known various SSSP algorithms for decades, implementing one for large-scale power-law graphs efficiently is still highly challenging today, because ① a work-efficient SSSP algorithm requires priority-order traversal of graph data, ② the priority queue needs to be scalable both in throughput capacity, and ③ priority-order traversal requires extensive random memory accesses on graph data. In this paper, we present SPLAG to accelerate SSSP for power-law graphs on FPGAs. SPLAG uses a coarse-grained priority queue (CGPQ) to enable high-throughput priority-order graph traversal with a large frontier. To mitigate the high-volume random accesses, SPLAG employs a customized vertex cache (CVC) to reduce off-chip memory access and improve the throughput to read and update vertex data. Experimental results on various synthetic and real-world datasets show up to a 4.9× speedup over state-of-the-art SSSP accelerators, a 2.6× speedup over 32-thread CPU running at 4.4 GHz, and a 0.9× speedup over an A100 GPU that has 4.1× power budget and 3.4× HBM bandwidth. Such a high performance would place SPLAG in the 14th position of the Graph 500 benchmark for data intensive applications (the highest using a single FPGA) with only a 45 W power budget. SPLAG is written in high-level synthesis C++ and is fully parameterized, which means it can be easily ported to various different FPGAs with different configurations. SPLAG is open-source at https://github.com/UCLA-VAST/splag.
单源最短路径(SSSP)问题是最重要且研究充分的图问题之一,广泛应用于许多领域,如道路导航、神经图像重建和社交网络分析。尽管我们几十年来已经了解了各种SSSP算法,但如今要在大规模幂律图上高效实现一种算法仍然极具挑战性,原因如下:①一个工作高效的SSSP算法需要对图数据进行优先顺序遍历;②优先级队列需要在吞吐量容量方面都具有可扩展性;③优先顺序遍历需要对图数据进行大量的随机内存访问。在本文中,我们提出了SPLAG来加速FPGA上幂律图的SSSP。SPLAG使用粗粒度优先级队列(CGPQ)来实现具有大前沿的高吞吐量优先顺序图遍历。为了减轻大量随机访问,SPLAG采用定制顶点缓存(CVC)来减少片外内存访问并提高读取和更新顶点数据的吞吐量。在各种合成数据集和真实世界数据集上的实验结果表明,与最先进的SSSP加速器相比,加速比高达4.9倍;与运行在4.4 GHz的32线程CPU相比,加速比为2.6倍;与具有4.1倍功率预算和3.4倍HBM带宽的A100 GPU相比,加速比为0.9倍。如此高性能将使SPLAG在数据密集型应用的Graph 500基准测试中排名第14位(使用单个FPGA时排名最高),且功率预算仅为45 W。SPLAG用高级综合C++编写且完全参数化,这意味着它可以轻松移植到具有不同配置 的各种不同FPGA上。SPLAG在https://github.com/UCLA-VAST/splag上开源。