Ruymgaart A Peter, Elber Ron
Department of Chemistry and Biochemistry, Institute for Computational Engineering and Sciences, University of Texas at Austin, Austin, TX 78712.
J Chem Theory Comput. 2012 Nov 13;8(11):4624-4636. doi: 10.1021/ct300324k. Epub 2012 Aug 21.
We report Graphics Processing Unit (GPU) and Open-MP parallel implementations of water-specific force calculations and of bond constraints for use in Molecular Dynamics simulations. We focus on a typical laboratory computing-environment in which a CPU with a few cores is attached to a GPU. We discuss in detail the design of the code and we illustrate performance comparable to highly optimized codes such as GROMACS. Beside speed our code shows excellent energy conservation. Utilization of water-specific lists allows the efficient calculations of non-bonded interactions that include water molecules and results in a speed-up factor of more than 40 on the GPU compared to code optimized on a single CPU core for systems larger than 20,000 atoms. This is up four-fold from a factor of 10 reported in our initial GPU implementation that did not include a water-specific code. Another optimization is the implementation of constrained dynamics entirely on the GPU. The routine, which enforces constraints of all bonds, runs in parallel on multiple Open-MP cores or entirely on the GPU. It is based on Conjugate Gradient solution of the Lagrange multipliers (CG SHAKE). The GPU implementation is partially in double precision and requires no communication with the CPU during the execution of the SHAKE algorithm. The (parallel) implementation of SHAKE allows an increase of the time step to 2.0fs while maintaining excellent energy conservation. Interestingly, CG SHAKE is faster than the usual bond relaxation algorithm even on a single core if high accuracy is expected. The significant speedup of the optimized components transfers the computational bottleneck of the MD calculation to the reciprocal part of Particle Mesh Ewald (PME).
我们报告了用于分子动力学模拟的水特异性力计算和键约束的图形处理单元(GPU)及Open-MP并行实现。我们聚焦于一种典型的实验室计算环境,即具有几个核心的CPU连接到一个GPU。我们详细讨论了代码设计,并展示了与诸如GROMACS等高度优化的代码相当的性能。除了速度之外,我们的代码还具有出色的能量守恒特性。利用水特异性列表可以高效计算包含水分子的非键相互作用,对于大于20,000个原子的系统,与在单个CPU核心上优化的代码相比,在GPU上的加速因子超过40。这比我们最初未包含水特异性代码的GPU实现中报告的10倍加速因子提高了四倍。另一个优化是在GPU上完全实现约束动力学。该例程用于强制执行所有键的约束,可在多个Open-MP核心上并行运行,也可完全在GPU上运行。它基于拉格朗日乘子的共轭梯度解(CG SHAKE)。GPU实现部分采用双精度,并且在执行SHAKE算法期间无需与CPU通信。SHAKE的(并行)实现允许将时间步长增加到2.0飞秒,同时保持出色的能量守恒。有趣的是,如果期望高精度,即使在单核上,CG SHAKE也比通常的键松弛算法更快。优化组件的显著加速将分子动力学计算的计算瓶颈转移到了粒子网格埃瓦尔德(PME)的倒数部分。