Suppr超能文献

大规模全频GW计算的GPU加速

GPU Acceleration of Large-Scale Full-Frequency GW Calculations.

作者信息

Yu Victor Wen-Zhe, Govoni Marco

机构信息

Materials Science Division, Argonne National Laboratory, Lemont, Illinois 60439, United States.

Pritzker School of Molecular Engineering, The University of Chicago, Chicago, Illinois 60637, United States.

出版信息

J Chem Theory Comput. 2022 Aug 9;18(8):4690-4707. doi: 10.1021/acs.jctc.2c00241. Epub 2022 Aug 1.

Abstract

Many-body perturbation theory is a powerful method to simulate electronic excitations in molecules and materials starting from the output of density functional theory calculations. By implementing the theory efficiently so as to run at scale on the latest leadership high-performance computing systems it is possible to extend the scope of GW calculations. We present a GPU acceleration study of the full-frequency GW method as implemented in the WEST code. Excellent performance is achieved through the use of (i) optimized GPU libraries, e.g., cuFFT and cuBLAS, (ii) a hierarchical parallelization strategy that minimizes CPU-CPU, CPU-GPU, and GPU-GPU data transfer operations, (iii) nonblocking MPI communications that overlap with GPU computations, and (iv) mixed precision in selected portions of the code. A series of performance benchmarks has been carried out on leadership high-performance computing systems, showing a substantial speedup of the GPU-accelerated version of WEST with respect to its CPU version. Good strong and weak scaling is demonstrated using up to 25 920 GPUs. Finally, we showcase the capability of the GPU version of WEST for large-scale, full-frequency GW calculations of realistic systems, e.g., a nanostructure, an interface, and a defect, comprising up to 10 368 valence electrons.

摘要

多体微扰理论是一种从密度泛函理论计算输出出发模拟分子和材料中电子激发的强大方法。通过有效地实现该理论,以便在最新的领先高性能计算系统上大规模运行,可以扩展GW计算的范围。我们展示了在WEST代码中实现的全频GW方法的GPU加速研究。通过使用(i)优化的GPU库,如cuFFT和cuBLAS,(ii)一种将CPU-CPU、CPU-GPU和GPU-GPU数据传输操作降至最低的分层并行化策略,(iii)与GPU计算重叠的非阻塞MPI通信,以及(iv)代码选定部分的混合精度,实现了出色的性能。在领先的高性能计算系统上进行了一系列性能基准测试,结果表明WEST的GPU加速版本相对于其CPU版本有显著的加速。使用多达259​​20个GPU展示了良好的强缩放和弱缩放性能。最后,我们展示了WEST的GPU版本用于对包含多达10368个价电子的实际系统(如纳米结构、界面和缺陷)进行大规模全频GW计算的能力。

文献AI研究员

20分钟写一篇综述,助力文献阅读效率提升50倍。

立即体验

用中文搜PubMed

大模型驱动的PubMed中文搜索引擎

马上搜索

文档翻译

学术文献翻译模型,支持多种主流文档格式。

立即体验