CP2K 代码在 CPU 和 GPU 上进行从头算分子动力学的性能分析。

Performance Analysis of CP2K Code for Ab Initio Molecular Dynamics on CPUs and GPUs.

机构信息

Applied Computer Science Division (CCS-7), Los Alamos National Laboratory, Los Alamos, New Mexico 87545, United States.

Chemistry Division (C-IIAC), Los Alamos National Laboratory, Los Alamos, New Mexico 87545, United States.

出版信息

J Chem Inf Model. 2022 May 23;62(10):2378-2386. doi: 10.1021/acs.jcim.1c01538. Epub 2022 Apr 22.

DOI:10.1021/acs.jcim.1c01538

PMID:35451847

Abstract

Using a realistic molecular catalyst system, we conduct scaling studies of ab initio molecular dynamics simulations using the popular CP2K code on both Intel Xeon CPU and NVIDIA V100 GPU architectures. Additional performance improvements were gained by finding more optimal process placement and affinity settings. Statistical methods were employed to understand performance changes in spite of the variability in runtime for each molecular dynamics timestep. Ideal conditions for CPU runs were found when running at least four MPI ranks per node, bound evenly across each socket. This study also showed that fully utilizing processing cores, with one OpenMP thread per core, performed better than when reserving cores for the system. The CPU-only simulations scaled at 70% or more of the ideal scaling up to 10 compute nodes, after which the returns began to diminish more quickly. Simulations on a single 40-core node with two NVIDIA V100 GPUs for acceleration achieved over 3.7× speedup compared to the fastest single 36-core node CPU-only version. These same GPU runs showed a 13% speedup over the fastest time achieved across five CPU-only nodes.

摘要

使用真实的分子催化剂体系，我们使用流行的 CP2K 代码在 Intel Xeon CPU 和 NVIDIA V100 GPU 架构上对从头算分子动力学模拟进行了扩展研究。通过寻找更优的进程放置和亲和性设置，获得了额外的性能提升。尽管每个分子动力学时间步的运行时间存在差异，但我们采用了统计方法来理解性能变化。在每个节点上至少运行四个 MPI 等级，均匀分布在每个套接字上，我们找到了 CPU 运行的理想条件。本研究还表明，充分利用处理核心，每个核心使用一个 OpenMP 线程，比为系统保留核心的性能更好。在 10 个计算节点之前，CPU 仅模拟的扩展比例达到理想扩展的 70%或更高，之后收益开始更快地减少。在单个 40 核节点上使用两个 NVIDIA V100 GPU 进行加速的模拟与最快的单个 36 核节点 CPU 仅模拟版本相比，实现了超过 3.7 倍的加速。这些相同的 GPU 运行与五个 CPU 仅模拟节点中最快的时间相比，实现了 13%的速度提升。

Suppr 超能文献

文献检索

文件翻译

深度研究

Suppr 超能文献

文献检索

文件翻译

深度研究

CP2K 代码在 CPU 和 GPU 上进行从头算分子动力学的性能分析。

Performance Analysis of CP2K Code for Ab Initio Molecular Dynamics on CPUs and GPUs.

机构信息

出版信息

相似文献

引用本文的文献

CP2K 代码在 CPU 和 GPU 上进行从头算分子动力学的性能分析。

Performance Analysis of CP2K Code for Ab Initio Molecular Dynamics on CPUs and GPUs.

机构信息

出版信息

相似文献

引用本文的文献