在高延迟网络上加速并行GROMACS

Speeding up parallel GROMACS on high-latency networks.

作者信息

Kutzner Carsten, van der Spoel David, Fechner Martin, Lindahl Erik, Schmitt Udo W, de Groot Bert L, Grubmüller Helmut

机构信息

Department of Theoretical and Computational Biophysics, Max-Planck-Institute for Biophysical Chemistry, Am Fassberg 11, 37077 Göttingen, Germany.

出版信息

J Comput Chem. 2007 Sep;28(12):2075-84. doi: 10.1002/jcc.20703.

DOI:10.1002/jcc.20703

PMID:17405124

Abstract

We investigate the parallel scaling of the GROMACS molecular dynamics code on Ethernet Beowulf clusters and what prerequisites are necessary for decent scaling even on such clusters with only limited bandwidth and high latency. GROMACS 3.3 scales well on supercomputers like the IBM p690 (Regatta) and on Linux clusters with a special interconnect like Myrinet or Infiniband. Because of the high single-node performance of GROMACS, however, on the widely used Ethernet switched clusters, the scaling typically breaks down when more than two computer nodes are involved, limiting the absolute speedup that can be gained to about 3 relative to a single-CPU run. With the LAM MPI implementation, the main scaling bottleneck is here identified to be the all-to-all communication which is required every time step. During such an all-to-all communication step, a huge amount of messages floods the network, and as a result many TCP packets are lost. We show that Ethernet flow control prevents network congestion and leads to substantial scaling improvements. For 16 CPUs, e.g., a speedup of 11 has been achieved. However, for more nodes this mechanism also fails. Having optimized an all-to-all routine, which sends the data in an ordered fashion, we show that it is possible to completely prevent packet loss for any number of multi-CPU nodes. Thus, the GROMACS scaling dramatically improves, even for switches that lack flow control. In addition, for the common HP ProCurve 2848 switch we find that for optimum all-to-all performance it is essential how the nodes are connected to the switch's ports. This is also demonstrated for the example of the Car-Parinello MD code.

摘要

我们研究了GROMACS分子动力学代码在以太网Beowulf集群上的并行缩放情况，以及即使在带宽有限和延迟较高的此类集群上实现良好缩放所需的前提条件。GROMACS 3.3在诸如IBM p690（Regatta）的超级计算机以及具有诸如Myrinet或Infiniband等特殊互连的Linux集群上缩放效果良好。然而，由于GROMACS的单节点性能较高，在广泛使用的以太网交换集群上，当涉及到两个以上的计算机节点时，缩放通常会失效，相对于单CPU运行，可获得的绝对加速比限制在约3倍。使用LAM MPI实现时，主要的缩放瓶颈被确定为每个时间步所需的全对全通信。在这样的全对全通信步骤中，大量消息充斥网络，结果许多TCP数据包丢失。我们表明以太网流量控制可防止网络拥塞并带来显著的缩放改进。例如，对于16个CPU，已实现了11倍的加速比。然而，对于更多节点，此机制也会失效。通过优化以有序方式发送数据的全对全例程，我们表明对于任意数量的多CPU节点都可以完全防止数据包丢失。因此，即使对于缺乏流量控制的交换机，GROMACS的缩放也会显著改善。此外，对于常见的HP ProCurve 2848交换机，我们发现节点连接到交换机端口的方式对于最佳全对全性能至关重要。这也通过Car-Parinello MD代码的示例得到了证明。

相似文献

Speeding up parallel GROMACS on high-latency networks.在高延迟网络上加速并行GROMACS

J Comput Chem. 2007 Sep;28(12):2075-84. doi: 10.1002/jcc.20703.

Large-scale, long-term nonadiabatic electron molecular dynamics for describing material properties and phenomena in extreme environments.用于描述极端环境下材料性质和现象的大规模、长程非绝热电子分子动力学。

J Comput Chem. 2011 Feb;32(3):497-512. doi: 10.1002/jcc.21637. Epub 2010 Sep 1.

Parallelizing a molecular dynamics algorithm on a multiprocessor workstation using OpenMP.使用OpenMP在多处理器工作站上并行化分子动力学算法。

J Chem Inf Model. 2005 Nov-Dec;45(6):1943-52. doi: 10.1021/ci050126l.

More bang for your buck: Improved use of GPU nodes for GROMACS 2018.花小钱办大事：改进 GPU 节点在 GROMACS 2018 中的使用。

J Comput Chem. 2019 Oct 15;40(27):2418-2431. doi: 10.1002/jcc.26011. Epub 2019 Jul 1.

jSimMacs for GROMACS: a Java application for advanced molecular dynamics simulations with remote access capability.用于GROMACS的jSimMacs：一款具有远程访问功能的用于高级分子动力学模拟的Java应用程序。

J Chem Inf Model. 2009 Oct;49(10):2412-7. doi: 10.1021/ci900248f.

A GPU-Accelerated Fast Multipole Method for GROMACS: Performance and Accuracy.GPU 加速的 GROMACS 快速多极方法：性能与精度。

J Chem Theory Comput. 2020 Nov 10;16(11):6938-6949. doi: 10.1021/acs.jctc.0c00744. Epub 2020 Oct 21.

Parallelization of MRCI based on hole-particle symmetry.

J Comput Chem. 2005 Jan 15;26(1):88-96. doi: 10.1002/jcc.20148.

A CPU/MIC Collaborated Parallel Framework for GROMACS on Tianhe-2 Supercomputer.天河 2 号超级计算机上的 GROMACS 的 CPU/MIC 协作并行框架。

IEEE/ACM Trans Comput Biol Bioinform. 2019 Mar-Apr;16(2):425-433. doi: 10.1109/TCBB.2017.2713362. Epub 2017 Jun 16.

Optimal structure of complex networks for minimizing traffic congestion.用于最小化交通拥堵的复杂网络的最优结构。

Chaos. 2007 Dec;17(4):043103. doi: 10.1063/1.2790367.

Constructing Neuronal Network Models in Massively Parallel Environments.在大规模并行环境中构建神经网络模型。

Front Neuroinform. 2017 May 16;11:30. doi: 10.3389/fninf.2017.00030. eCollection 2017.

引用本文的文献

Review of Electrostatic Force Calculation Methods and Their Acceleration in Molecular Dynamics Packages Using Graphics Processors.分子动力学软件包中使用图形处理器的静电力计算方法及其加速技术综述。

ACS Omega. 2022 Sep 8;7(37):32877-32896. doi: 10.1021/acsomega.2c03189. eCollection 2022 Sep 20.

Molecular dynamics simulation of four typical surfactants in aqueous solution.四种典型表面活性剂在水溶液中的分子动力学模拟

RSC Adv. 2019 Jan 24;9(6):3224-3231. doi: 10.1039/c8ra09670h. eCollection 2019 Jan 22.

A GPU-Accelerated Fast Multipole Method for GROMACS: Performance and Accuracy.GPU 加速的 GROMACS 快速多极方法：性能与精度。

J Chem Theory Comput. 2020 Nov 10;16(11):6938-6949. doi: 10.1021/acs.jctc.0c00744. Epub 2020 Oct 21.

Identification of potential drug candidates to combat COVID-19: a structural study using the main protease (mpro) of SARS-CoV-2.鉴定抗击 COVID-19 的潜在药物候选物：使用 SARS-CoV-2 的主要蛋白酶（mpro）进行的结构研究。

J Biomol Struct Dyn. 2021 Oct;39(17):6649-6659. doi: 10.1080/07391102.2020.1798286. Epub 2020 Aug 3.

Experimentally-driven protein structure modeling.基于实验的蛋白质结构建模。

J Proteomics. 2020 May 30;220:103777. doi: 10.1016/j.jprot.2020.103777. Epub 2020 Apr 5.

Effect of Chemical Permeation Enhancers on Skin Permeability: In silico screening using Molecular Dynamics simulations.化学渗透增强剂对皮肤渗透性的影响：使用分子动力学模拟进行的计算机筛选。

Sci Rep. 2019 Feb 6;9(1):1456. doi: 10.1038/s41598-018-37900-0.

Theoretical Study of the Charge Transfer Exciton Binding Energy in Semiconductor Materials for Polymer:Fullerene-Based Bulk Heterojunction Solar Cells.用于聚合物：富勒烯基本体异质结太阳能电池的半导体材料中电荷转移激子结合能的理论研究。

J Phys Chem A. 2019 Feb 14;123(6):1233-1242. doi: 10.1021/acs.jpca.8b12292. Epub 2019 Feb 1.

T7 RNA Polymerase Discriminates Correct and Incorrect Nucleoside Triphosphates by Free Energy.T7 RNA 聚合酶通过自由能区分正确和错误的核苷三磷酸。

Biophys J. 2018 Apr 24;114(8):1755-1761. doi: 10.1016/j.bpj.2018.02.033.

Mechanism of NTP Binding to the Active Site of T7 RNA Polymerase Revealed by Free-Energy Simulation.通过自由能模拟揭示NTP与T7 RNA聚合酶活性位点结合的机制

Biophys J. 2017 Jun 6;112(11):2253-2260. doi: 10.1016/j.bpj.2017.04.039.

Effect of Size and Surface Charge of Gold Nanoparticles on their Skin Permeability: A Molecular Dynamics Study.金纳米粒子的尺寸和表面电荷对其皮肤渗透性的影响：分子动力学研究。

Sci Rep. 2017 Mar 28;7:45292. doi: 10.1038/srep45292.

文献检索

告别复杂PubMed语法，用中文像聊天一样搜索，搜遍4000万医学文献。AI智能推荐，让科研检索更轻松。

立即免费搜索

文件翻译

保留排版，准确专业，支持PDF/Word/PPT等文件格式，支持 12+语言互译。

免费翻译文档

深度研究

AI帮你快速写综述，25分钟生成高质量综述，智能提取关键信息，辅助科研写作。

立即免费体验

在高延迟网络上加速并行GROMACS

Speeding up parallel GROMACS on high-latency networks.

作者信息

机构信息

出版信息

相似文献

引用本文的文献

文献检索

文件翻译

深度研究

Suppr 超能文献

相似文献

引用本文的文献