密度矩阵重整化群方法的并行实现，在单个DGX-H100 GPU节点上实现了四分之一petaFLOPS的性能。

Parallel Implementation of the Density Matrix Renormalization Group Method Achieving a Quarter petaFLOPS Performance on a Single DGX-H100 GPU Node.

作者信息

Menczer Andor, van Damme Maarten, Rask Alan, Huntington Lee, Hammond Jeff, Xantheas Sotiris S, Ganahl Martin, Legeza Örs

机构信息

Strongly Correlated Systems Lendület Research Group, Wigner Research Centre for Physics, H-1525 Budapest, Hungary.

Eötvös Loránd University, Pázmány Péter Sétány 1/C, 1117 Budapest, Hungary.

出版信息

J Chem Theory Comput. 2024 Oct 8;20(19):8397-8404. doi: 10.1021/acs.jctc.4c00903. Epub 2024 Sep 19.

DOI:10.1021/acs.jctc.4c00903

PMID:39297788

原文链接:https://pmc.ncbi.nlm.nih.gov/articles/PMC11465466/

Abstract

We report cutting edge performance results on a single node hybrid CPU-multi-GPU implementation of the spin adapted Density Matrix Renormalization Group (DMRG) method on current state-of-the-art NVIDIA DGX-H100 architectures. We evaluate the performance of the DMRG electronic structure calculations for the active compounds of the FeMoco, the primary cofactor of nitrogenase, and cytochrome P450 (CYP) enzymes with complete active space (CAS) sizes of up to 113 electrons in 76 orbitals [CAS(113, 76)] and 63 electrons in 58 orbitals [CAS(63, 58)], respectively. We achieve 246 teraFLOPS of sustained performance, an improvement of more than 2.5× compared to the performance achieved on the DGX-A100 architectures and an 80× acceleration compared to an OpenMP parallelized implementation on a 128-core CPU architecture. Our work highlights the ability of tensor network algorithms to efficiently utilize high-performance multi-GPU hardware and shows that the combination of tensor networks with modern large-scale GPU accelerators can pave the way toward solving some of the most challenging problems in quantum chemistry and beyond.

摘要

我们报告了在当前最先进的NVIDIA DGX-H100架构上，对自旋适配密度矩阵重整化群（DMRG）方法进行单节点混合CPU-多GPU实现的前沿性能结果。我们评估了DMRG电子结构计算对固氮酶的主要辅因子FeMoco以及细胞色素P450（CYP）酶的活性化合物的性能，其完全活性空间（CAS）大小分别高达76个轨道中的113个电子[CAS(113, 76)]和58个轨道中的63个电子[CAS(63, 58)]。我们实现了246万亿次浮点运算的持续性能，与在DGX-A100架构上实现的性能相比提高了2.5倍以上，与在128核CPU架构上的OpenMP并行实现相比加速了80倍。我们的工作突出了张量网络算法有效利用高性能多GPU硬件的能力，并表明张量网络与现代大规模GPU加速器的结合可以为解决量子化学及其他领域一些最具挑战性的问题铺平道路。

https://cdn.ncbi.nlm.nih.gov/pmc/blobs/bcb0/11465466/1f2a9f6977cf/ct4c00903_0001.jpg

相似文献

Parallel Implementation of the Density Matrix Renormalization Group Method Achieving a Quarter petaFLOPS Performance on a Single DGX-H100 GPU Node.密度矩阵重整化群方法的并行实现，在单个DGX-H100 GPU节点上实现了四分之一petaFLOPS的性能。

J Chem Theory Comput. 2024 Oct 8;20(19):8397-8404. doi: 10.1021/acs.jctc.4c00903. Epub 2024 Sep 19.

Distributed Multi-GPU Density Matrix Renormalization Group Algorithm with Applications to the P-Cluster of Nitrogenase.分布式多GPU密度矩阵重整化群算法及其在固氮酶P簇中的应用

J Chem Theory Comput. 2024 Jan 23;20(2):775-786. doi: 10.1021/acs.jctc.3c01228. Epub 2024 Jan 10.

Tensor Network State Algorithms on AI Accelerators.人工智能加速器上的张量网络状态算法

J Chem Theory Comput. 2024 Oct 22;20(20):8897-8910. doi: 10.1021/acs.jctc.4c00800. Epub 2024 Oct 14.

High-performance ab initio density matrix renormalization group method: applicability to large-scale multireference problems for metal compounds.高性能从头算密度矩阵重整化群方法：对金属化合物大规模多参考问题的适用性。

J Chem Phys. 2009 Jun 21;130(23):234114. doi: 10.1063/1.3152576.

Accelerating Coupled-Cluster Calculations with GPUs: An Implementation of the Density-Fitted CCSD(T) Approach for Heterogeneous Computing Architectures Using OpenMP Directives.利用GPU加速耦合簇计算：一种使用OpenMP指令在异构计算架构上实现密度拟合CCSD(T)方法的方案

J Chem Theory Comput. 2023 Nov 14;19(21):7640-7657. doi: 10.1021/acs.jctc.3c00876. Epub 2023 Oct 25.

Massively parallel quantum chemical density matrix renormalization group method.大规模并行量子化学密度矩阵重整化群方法

J Comput Chem. 2021 Mar 30;42(8):534-544. doi: 10.1002/jcc.26476. Epub 2020 Dec 30.

Multi-GPU implementation of a VMAT treatment plan optimization algorithm.容积调强放疗（VMAT）治疗计划优化算法的多图形处理器（Multi-GPU）实现

Med Phys. 2015 Jun;42(6):2841-52. doi: 10.1118/1.4919742.

Harnessing the Power of Multi-GPU Acceleration into the Quantum Interaction Computational Kernel Program.利用多 GPU 加速到量子相互作用计算核心程序中。

J Chem Theory Comput. 2021 Jul 13;17(7):3955-3966. doi: 10.1021/acs.jctc.1c00145. Epub 2021 Jun 1.

Toward Reliable Prediction of Hyperfine Coupling Constants Using Ab Initio Density Matrix Renormalization Group Method: Diatomic (2)Σ and Vinyl Radicals as Test Cases.利用从头算密度矩阵重整化群方法实现超精细耦合常数的可靠预测：以双原子(2)Σ和乙烯基自由基为例

J Chem Theory Comput. 2014 May 13;10(5):1953-67. doi: 10.1021/ct400978j.

GPU accelerated implementation of NCI calculations using promolecular density.使用前分子密度的NCI计算的GPU加速实现。

J Comput Chem. 2017 May 30;38(14):1071-1083. doi: 10.1002/jcc.24786. Epub 2017 Mar 25.

引用本文的文献

QCMaquis 4.0: Multipurpose Electronic, Vibrational, and Vibronic Structure and Dynamics Calculations with the Density Matrix Renormalization Group.QCMaquis 4.0：使用密度矩阵重整化群进行多用途电子、振动和振子结构及动力学计算

J Phys Chem A. 2025 Aug 14;129(32):7549-7574. doi: 10.1021/acs.jpca.5c02970. Epub 2025 Aug 1.

Orbital Optimization of Large Active Spaces via AI-Accelerators.通过人工智能加速器对大型活性空间进行轨道优化

J Chem Theory Comput. 2025 Jul 8;21(13):6545-6558. doi: 10.1021/acs.jctc.5c00571. Epub 2025 Jun 13.

Accurate quantum-centric simulations of supramolecular interactions.超分子相互作用的精确量子中心模拟。

Res Sq. 2025 Mar 19:rs.3.rs-5566874. doi: 10.21203/rs.3.rs-5566874/v1.

Massively Parallel Tensor Network State Algorithms on Hybrid CPU-GPU Based Architectures.基于混合CPU-GPU架构的大规模并行张量网络状态算法

J Chem Theory Comput. 2025 Feb 25;21(4):1572-1587. doi: 10.1021/acs.jctc.4c00661. Epub 2025 Feb 4.

Tensor Network State Algorithms on AI Accelerators.人工智能加速器上的张量网络状态算法

J Chem Theory Comput. 2024 Oct 22;20(20):8897-8910. doi: 10.1021/acs.jctc.4c00800. Epub 2024 Oct 14.

DMRG-Tailored Coupled Cluster Method in the 4c-Relativistic Domain: General Implementation and Application to the NUHFI and NUF Molecules.4c相对论域中DMRG定制耦合簇方法：通用实现及其在NUHFI和NUF分子中的应用

J Chem Theory Comput. 2024 Oct 22;20(20):8862-8875. doi: 10.1021/acs.jctc.4c00641. Epub 2024 Oct 9.

本文引用的文献

Perspective on Coupled-cluster Theory. The evolution toward simplicity in quantum chemistry.耦合簇理论的视角。量子化学向简化方向的发展。

Phys Chem Chem Phys. 2024 Mar 6;26(10):8013-8037. doi: 10.1039/d3cp03853j.

Distributed Multi-GPU Density Matrix Renormalization Group Algorithm with Applications to the P-Cluster of Nitrogenase.分布式多GPU密度矩阵重整化群算法及其在固氮酶P簇中的应用

J Chem Theory Comput. 2024 Jan 23;20(2):775-786. doi: 10.1021/acs.jctc.3c01228. Epub 2024 Jan 10.

Predicting the FCI Energy of Large Systems to Chemical Accuracy from Restricted Active Space Density Matrix Renormalization Group Calculations.通过受限活性空间密度矩阵重整化群计算将大体系的FCI能量预测至化学精度。

J Chem Theory Comput. 2024 Jan 9;20(1):87-102. doi: 10.1021/acs.jctc.3c01001. Epub 2023 Dec 18.

Synergistic pretraining of parametrized quantum circuits via tensor networks.通过张量网络对参数化量子电路进行协同预训练。

Nat Commun. 2023 Dec 15;14(1):8367. doi: 10.1038/s41467-023-43908-6.

Evaluating the evidence for exponential quantum advantage in ground-state quantum chemistry.评估基态量子化学中指数量子优势的证据。

Nat Commun. 2023 Apr 7;14(1):1952. doi: 10.1038/s41467-023-37587-6.

Toward Large-Scale Restricted Active Space Calculations Inspired by the Schmidt Decomposition.受施密特分解启发的大规模受限活性空间计算

J Phys Chem A. 2022 Dec 29;126(51):9709-9718. doi: 10.1021/acs.jpca.2c05952. Epub 2022 Dec 15.

Large Scale Quantum Chemistry with Tensor Processing Units.使用张量处理单元的大规模量子化学

J Chem Theory Comput. 2023 Jan 10;19(1):25-32. doi: 10.1021/acs.jctc.2c00876. Epub 2022 Dec 12.

Twenty Years of Auxiliary-Field Quantum Monte Carlo in Quantum Chemistry: An Overview and Assessment on Main Group Chemistry and Bond-Breaking.辅助场量子蒙特卡罗在量子化学中的二十年：主族化学和键断裂的概述和评估。

J Chem Theory Comput. 2022 Dec 13;18(12):7024-7042. doi: 10.1021/acs.jctc.2c00802. Epub 2022 Oct 18.

Reliably assessing the electronic structure of cytochrome P450 on today's classical computers and tomorrow's quantum computers.可靠地评估细胞色素 P450 的电子结构，无论是在当今的经典计算机上，还是在未来的量子计算机上。

Proc Natl Acad Sci U S A. 2022 Sep 20;119(38):e2203533119. doi: 10.1073/pnas.2203533119. Epub 2022 Sep 12.

Solving the Sampling Problem of the Sycamore Quantum Circuits.解决“悬铃木”量子电路的采样问题。

Phys Rev Lett. 2022 Aug 26;129(9):090502. doi: 10.1103/PhysRevLett.129.090502.

文献检索

告别复杂PubMed语法，用中文像聊天一样搜索，搜遍4000万医学文献。AI智能推荐，让科研检索更轻松。

立即免费搜索

文件翻译

保留排版，准确专业，支持PDF/Word/PPT等文件格式，支持 12+语言互译。

免费翻译文档

深度研究

AI帮你快速写综述，25分钟生成高质量综述，智能提取关键信息，辅助科研写作。

立即免费体验

密度矩阵重整化群方法的并行实现，在单个DGX-H100 GPU节点上实现了四分之一petaFLOPS的性能。

Parallel Implementation of the Density Matrix Renormalization Group Method Achieving a Quarter petaFLOPS Performance on a Single DGX-H100 GPU Node.

作者信息

机构信息

出版信息

相似文献

引用本文的文献

本文引用的文献

文献检索

文件翻译

深度研究

Suppr 超能文献

相似文献

引用本文的文献

本文引用的文献