用于多核格子玻尔兹曼模拟的跨平台编程模型。

Cross-platform programming model for many-core lattice Boltzmann simulations.

作者信息

Latt Jonas, Coreixas Christophe, Beny Joël

机构信息

Computer Science Department, University of Geneva, Carouge, Switzerland.

出版信息

PLoS One. 2021 Apr 29;16(4):e0250306. doi: 10.1371/journal.pone.0250306. eCollection 2021.

DOI:10.1371/journal.pone.0250306

PMID:33914788

原文链接:https://pmc.ncbi.nlm.nih.gov/articles/PMC8084255/

Abstract

We present a novel, hardware-agnostic implementation strategy for lattice Boltzmann (LB) simulations, which yields massive performance on homogeneous and heterogeneous many-core platforms. Based solely on C++17 Parallel Algorithms, our approach does not rely on any language extensions, external libraries, vendor-specific code annotations, or pre-compilation steps. Thanks in particular to a recently proposed GPU back-end to C++17 Parallel Algorithms, it is shown that a single code can compile and reach state-of-the-art performance on both many-core CPU and GPU environments for the solution of a given non trivial fluid dynamics problem. The proposed strategy is tested with six different, commonly used implementation schemes to test the performance impact of memory access patterns on different platforms. Nine different LB collision models are included in the tests and exhibit good performance, demonstrating the versatility of our parallel approach. This work shows that it is less than ever necessary to draw a distinction between research and production software, as a concise and generic LB implementation yields performances comparable to those achievable in a hardware specific programming language. The results also highlight the gains of performance achieved by modern many-core CPUs and their apparent capability to narrow the gap with the traditionally massively faster GPU platforms. All code is made available to the community in form of the open-source project stlbm, which serves both as a stand-alone simulation software and as a collection of reusable patterns for the acceleration of pre-existing LB codes.

摘要

我们提出了一种新颖的、与硬件无关的格子玻尔兹曼（LB）模拟实现策略，该策略在同质和异质多核平台上具有强大的性能。我们的方法仅基于C++17并行算法，不依赖任何语言扩展、外部库、特定于供应商的代码注释或预编译步骤。特别是由于最近提出的针对C++17并行算法的GPU后端，结果表明，对于给定的非平凡流体动力学问题的求解，单个代码可以在多核CPU和GPU环境中编译并达到最先进的性能。所提出的策略用六种不同的常用实现方案进行了测试，以测试内存访问模式对不同平台的性能影响。测试中包括九种不同的LB碰撞模型，并且表现出良好的性能，证明了我们并行方法的通用性。这项工作表明，区分研究软件和生产软件比以往任何时候都更不必要，因为简洁通用的LB实现所产生的性能与在特定于硬件的编程语言中所能实现的性能相当。结果还突出了现代多核CPU所实现的性能提升以及它们明显缩小与传统上速度快得多的GPU平台之间差距的能力。所有代码都以开源项目stlbm的形式提供给社区，该项目既作为一个独立的模拟软件，又作为用于加速现有LB代码的可重用模式的集合。