Hardy David J, Stone John E, Schulten Klaus
Beckman Institute, University of Illinois at Urbana-Champaign, 405 N. Mathews Ave., Urbana, IL, 61801.
Parallel Comput. 2009 Mar 1;35(3):164-177. doi: 10.1016/j.parco.2008.12.005.
Physical and engineering practicalities involved in microprocessor design have resulted in flat performance growth for traditional single-core microprocessors. The urgent need for continuing increases in the performance of scientific applications requires the use of many-core processors and accelerators such as graphics processing units (GPUs). This paper discusses GPU acceleration of the multilevel summation method for computing electrostatic potentials and forces for a system of charged atoms, which is a problem of paramount importance in biomolecular modeling applications. We present and test a new GPU algorithm for the long-range part of the potentials that computes a cutoff pair potential between lattice points, essentially convolving a fixed 3-D lattice of "weights" over all sub-cubes of a much larger lattice. The implementation exploits the different memory subsystems provided on the GPU to stream optimally sized data sets through the multiprocessors. We demonstrate for the full multilevel summation calculation speedups of up to 26 using a single GPU and 46 using multiple GPUs, enabling the computation of a high-resolution map of the electrostatic potential for a system of 1.5 million atoms in under 12 seconds.
微处理器设计中涉及的物理和工程实际情况导致传统单核微处理器的性能增长平缓。科学应用对性能持续提升的迫切需求使得多核处理器和诸如图形处理单元(GPU)之类的加速器得到了应用。本文讨论了用于计算带电原子系统静电势和力的多层求和方法的GPU加速,这在生物分子建模应用中是一个至关重要的问题。我们提出并测试了一种针对势的长程部分的新GPU算法,该算法计算晶格点之间的截止对势,本质上是在一个大得多的晶格的所有子立方体上对一个固定的三维“权重”晶格进行卷积。该实现利用了GPU上提供的不同内存子系统,以便通过多处理器最优地传输大小合适的数据集。我们展示了对于完整的多层求和计算,使用单个GPU时加速比高达26,使用多个GPU时加速比高达46,从而能够在不到12秒的时间内计算出一个包含150万个原子的系统的高分辨率静电势图。