• 文献检索
  • 文档翻译
  • 深度研究
  • 学术资讯
  • Suppr Zotero 插件Zotero 插件
  • 邀请有礼
  • 套餐&价格
  • 历史记录
应用&插件
Suppr Zotero 插件Zotero 插件浏览器插件Mac 客户端Windows 客户端微信小程序
定价
高级版会员购买积分包购买API积分包
服务
文献检索文档翻译深度研究API 文档MCP 服务
关于我们
关于 Suppr公司介绍联系我们用户协议隐私条款
关注我们

Suppr 超能文献

核心技术专利:CN118964589B侵权必究
粤ICP备2023148730 号-1Suppr @ 2026

文献检索

告别复杂PubMed语法,用中文像聊天一样搜索,搜遍4000万医学文献。AI智能推荐,让科研检索更轻松。

立即免费搜索

文件翻译

保留排版,准确专业,支持PDF/Word/PPT等文件格式,支持 12+语言互译。

免费翻译文档

深度研究

AI帮你快速写综述,25分钟生成高质量综述,智能提取关键信息,辅助科研写作。

立即免费体验

一种适用于分布式多GPU架构的高效RI-MP2算法。

An Efficient RI-MP2 Algorithm for Distributed Many-GPU Architectures.

作者信息

Snowdon Calum, Barca Giuseppe M J

机构信息

School of Computing, Australian National University, Canberra 2600, Australia.

School of Computing and Information Systems, University of Melbourne, Melbourne 3010, Australia.

出版信息

J Chem Theory Comput. 2024 Nov 12;20(21):9394-9406. doi: 10.1021/acs.jctc.4c00814. Epub 2024 Oct 18.

DOI:10.1021/acs.jctc.4c00814
PMID:39422609
Abstract

Second-order Møller-Plesset perturbation theory (MP2) using the Resolution of the Identity approximation (RI-MP2) is a widely used method for computing molecular energies beyond the Hartree-Fock mean-field approximation. However, its high computational cost and lack of efficient algorithms for modern supercomputing architectures limit its applicability to large molecules. In this paper, we present the first distributed-memory many-GPU RI-MP2 algorithm explicitly designed to utilize hundreds of GPU accelerators for every step of the computation. Our novel algorithm achieves near-peak performance on GPU-based supercomputers through the development of a distributed memory algorithm for forming RI-MP2 intermediate tensors with zero internode communication, except for a single asynchronous broadcast, and a distributed memory algorithm for the energy reduction step, capable of sustaining near-peak performance on clusters with several hundred GPUs. Comparative analysis shows our implementation outperforms state-of-the-art quantum chemistry software by over 3.5 times in speed while achieving an 8-fold reduction in computational power consumption. Benchmarking on the Perlmutter supercomputer, our algorithm achieves 11.8 PFLOP/s (83% of peak performance) performing and the RI-MP2 energy calculation on a 314-water cluster with 7850 primary and 30,144 auxiliary basis functions in 4 min on 180 nodes and 720 A100 GPUs. This performance represents a substantial improvement over traditional CPU-based methods, demonstrating significant time-to-solution and power consumption benefits of leveraging modern GPU-accelerated computing environments for quantum chemistry calculations.

摘要

使用单位分解近似(RI-MP2)的二阶莫勒-普莱塞特微扰理论(MP2)是一种广泛用于计算超越哈特里-福克平均场近似的分子能量的方法。然而,其高昂的计算成本以及缺乏适用于现代超级计算架构的高效算法,限制了它在大分子中的应用。在本文中,我们提出了首个分布式内存多GPU RI-MP2算法,该算法经过专门设计,在计算的每一步都能利用数百个GPU加速器。我们的新算法通过开发一种分布式内存算法来形成RI-MP2中间张量,除了一次异步广播外,节点间通信为零,以及一种用于能量约简步骤的分布式内存算法,在基于GPU的超级计算机上实现了接近峰值的性能,该算法能够在拥有数百个GPU的集群上维持接近峰值的性能。对比分析表明,我们的实现速度比最先进的量子化学软件快3.5倍以上,同时计算功耗降低了8倍。在珀尔马特超级计算机上进行基准测试,我们的算法在180个节点和720个A100 GPU上,对具有7850个基函数和30144个辅助基函数的314水团簇进行RI-MP2能量计算,每秒可执行11.8万亿次浮点运算(达到峰值性能的83%),耗时4分钟。这种性能相较于传统的基于CPU的方法有了显著提升,证明了利用现代GPU加速计算环境进行量子化学计算在求解时间和功耗方面具有显著优势。

相似文献

1
An Efficient RI-MP2 Algorithm for Distributed Many-GPU Architectures.一种适用于分布式多GPU架构的高效RI-MP2算法。
J Chem Theory Comput. 2024 Nov 12;20(21):9394-9406. doi: 10.1021/acs.jctc.4c00814. Epub 2024 Oct 18.
2
High-Performance Multi-GPU Analytic RI-MP2 Energy Gradients.高性能多图形处理器分析性RI-MP2能量梯度
J Chem Theory Comput. 2024 Mar 26;20(6):2505-2519. doi: 10.1021/acs.jctc.3c01424. Epub 2024 Mar 8.
3
Massively parallel algorithm and implementation of RI-MP2 energy calculation for peta-scale many-core supercomputers.大规模并行算法及万亿核众核超级计算机上 RI-MP2 能量计算的实现。
J Comput Chem. 2016 Nov 15;37(30):2623-2633. doi: 10.1002/jcc.24491. Epub 2016 Sep 16.
4
Accelerating Coupled-Cluster Calculations with GPUs: An Implementation of the Density-Fitted CCSD(T) Approach for Heterogeneous Computing Architectures Using OpenMP Directives.利用GPU加速耦合簇计算:一种使用OpenMP指令在异构计算架构上实现密度拟合CCSD(T)方法的方案
J Chem Theory Comput. 2023 Nov 14;19(21):7640-7657. doi: 10.1021/acs.jctc.3c00876. Epub 2023 Oct 25.
5
Analytical Gradient Using Cluster-in-Molecule RI-MP2 Method for the Geometry Optimizations of Large Systems.使用分子内簇RI-MP2方法进行大体系几何优化的分析梯度
J Chem Theory Comput. 2024 May 14;20(9):3626-3636. doi: 10.1021/acs.jctc.4c00087. Epub 2024 Apr 16.
6
Porting fragmentation methods to GPUs using an OpenMP API: Offloading the resolution-of-the-identity second-order Møller-Plesset perturbation method.使用 OpenMP API 将碎片方法移植到 GPU 上:卸载身份分辨率的二阶 Møller-Plesset 微扰方法。
J Chem Phys. 2023 Apr 28;158(16). doi: 10.1063/5.0143424.
7
Electron Correlation in the Condensed Phase from a Resolution of Identity Approach Based on the Gaussian and Plane Waves Scheme.基于高斯和平面波方案的单位分解方法对凝聚相中的电子关联研究
J Chem Theory Comput. 2013 Jun 11;9(6):2654-71. doi: 10.1021/ct4002202. Epub 2013 May 28.
8
The GPU-enabled divide-expand-consolidate RI-MP2 method (DEC-RI-MP2).基于图形处理器(GPU)加速的分裂-扩展-合并RI-MP2方法(DEC-RI-MP2)
J Comput Chem. 2017 Feb 5;38(4):228-237. doi: 10.1002/jcc.24678. Epub 2016 Dec 7.
9
Communication: A reduced scaling J-engine based reformulation of SOS-MP2 using graphics processing units.通讯:一种基于减少缩放的J引擎对使用图形处理单元的SOS-MP2的重新表述。
J Chem Phys. 2014 Aug 7;141(5):051106. doi: 10.1063/1.4891797.
10
MPI/OpenMP hybrid parallel algorithm for resolution of identity second-order Møller-Plesset perturbation calculation of analytical energy gradient for massively parallel multicore supercomputers.MPI/OpenMP 混合并行算法,用于解析含时密度泛函理论二次微扰理论计算的解析能量梯度,适用于大规模并行多核超级计算机。
J Comput Chem. 2017 Mar 30;38(8):489-507. doi: 10.1002/jcc.24701.