Ruymgaart A Peter, Cardenas Alfredo E, Elber Ron
Institute for Computational Engineering and Sciences, Department of Chemistry and Biochemistry, University of Texas at Austin, Austin Texas 78712.
J Chem Theory Comput. 2011 Aug 26;7(10):3072-3082. doi: 10.1021/ct200360f.
We report an optimized version of the molecular dynamics program MOIL that runs on a shared memory system with OpenMP and exploits the power of a Graphics Processing Unit (GPU). The model is of heterogeneous computing system on a single node with several cores sharing the same memory and a GPU. This is a typical laboratory tool, which provides excellent performance at minimal cost. Besides performance, emphasis is made on accuracy and stability of the algorithm probed by energy conservation for explicit-solvent atomically-detailed-models. Especially for long simulations energy conservation is critical due to the phenomenon known as "energy drift" in which energy errors accumulate linearly as a function of simulation time. To achieve long time dynamics with acceptable accuracy the drift must be particularly small. We identify several means of controlling long-time numerical accuracy while maintaining excellent speedup. To maintain a high level of energy conservation SHAKE and the Ewald reciprocal summation are run in double precision. Double precision summation of real-space non-bonded interactions improves energy conservation. In our best option, the energy drift using 1fs for a time step while constraining the distances of all bonds, is undetectable in 10ns simulation of solvated DHFR (Dihydrofolate reductase). Faster options, shaking only bonds with hydrogen atoms, are also very well behaved and have drifts of less than 1kcal/mol per nanosecond of the same system. CPU/GPU implementations require changes in programming models. We consider the use of a list of neighbors and quadratic versus linear interpolation in lookup tables of different sizes. Quadratic interpolation with a smaller number of grid points is faster than linear lookup tables (with finer representation) without loss of accuracy. Atomic neighbor lists were found most efficient. Typical speedups are about a factor of 10 compared to a single-core single-precision code.
我们报告了分子动力学程序MOIL的一个优化版本,它运行在带有OpenMP的共享内存系统上,并利用图形处理单元(GPU)的能力。该模型是一个单节点上的异构计算系统,有几个共享相同内存的核心和一个GPU。这是一种典型的实验室工具,能以最小的成本提供出色的性能。除了性能,还强调了算法的准确性和稳定性,通过显式溶剂原子详细模型的能量守恒来探究。特别是对于长时间模拟,由于“能量漂移”现象,能量守恒至关重要,在这种现象中,能量误差会随着模拟时间线性累积。为了在可接受的精度下实现长时间动力学,漂移必须特别小。我们确定了几种在保持出色加速比的同时控制长时间数值精度的方法。为了保持高水平的能量守恒,SHAKE和埃瓦尔德互易求和以双精度运行。实空间非键相互作用的双精度求和提高了能量守恒。在我们的最佳方案中,在对溶剂化的二氢叶酸还原酶(DHFR)进行10纳秒模拟时,使用1飞秒的时间步长并约束所有键的距离,能量漂移是不可检测的。更快的方案,即只对含氢原子的键进行振动,表现也非常好,在相同系统中每纳秒的漂移小于1千卡/摩尔。CPU/GPU实现需要改变编程模型。我们考虑使用邻居列表以及不同大小查找表中的二次插值与线性插值。较少网格点的二次插值比线性查找表(具有更精细的表示)更快且不损失精度。发现原子邻居列表效率最高。与单核单精度代码相比,典型的加速比约为10倍。