Swedish e-Science Research Center, PDC Center for High Performance Computing, KTH Royal Institute of Technology, 100 44 Stockholm, Sweden.
Science for Life Laboratory, Department of Applied Physics, Swedish e-Science Research Center, KTH Royal Institute of Technology, Box 1031, 171 21 Solna, Sweden.
J Chem Phys. 2020 Oct 7;153(13):134110. doi: 10.1063/5.0018516.
The introduction of accelerator devices such as graphics processing units (GPUs) has had profound impact on molecular dynamics simulations and has enabled order-of-magnitude performance advances using commodity hardware. To fully reap these benefits, it has been necessary to reformulate some of the most fundamental algorithms, including the Verlet list, pair searching, and cutoffs. Here, we present the heterogeneous parallelization and acceleration design of molecular dynamics implemented in the GROMACS codebase over the last decade. The setup involves a general cluster-based approach to pair lists and non-bonded pair interactions that utilizes both GPU and central processing unit (CPU) single instruction, multiple data acceleration efficiently, including the ability to load-balance tasks between CPUs and GPUs. The algorithm work efficiency is tuned for each type of hardware, and to use accelerators more efficiently, we introduce dual pair lists with rolling pruning updates. Combined with new direct GPU-GPU communication and GPU integration, this enables excellent performance from single GPU simulations through strong scaling across multiple GPUs and efficient multi-node parallelization.
加速器设备(如图形处理单元(GPU))的引入对分子动力学模拟产生了深远的影响,并利用商品硬件实现了数量级的性能提升。为了充分利用这些优势,有必要重新制定一些最基本的算法,包括 Verlet 列表、对搜索和截止。在这里,我们展示了过去十年中在 GROMACS 代码库中实现的分子动力学的异构并行化和加速设计。该设置涉及一种通用的基于集群的对列表和非键对相互作用的方法,它有效地利用了 GPU 和中央处理单元(CPU)单指令多数据加速,包括在 CPU 和 GPU 之间平衡任务的能力。针对每种类型的硬件对算法工作效率进行了调整,并引入了带有滚动修剪更新的双对列表,以更有效地利用加速器。结合新的直接 GPU-GPU 通信和 GPU 集成,这使得从单个 GPU 模拟到通过多个 GPU 进行强扩展以及高效的多节点并行化都能获得出色的性能。