State Key Laboratory of Bioreactor Engineering, Shanghai Key Laboratory of Chemical Biology, School of Pharmacy, East China University of Science and Technology , Shanghai 200237, China.
School of Information Science and Engineering, East China University of Science and Technology , Shanghai 200237, China.
J Chem Theory Comput. 2011 Jun 14;7(6):1595-603. doi: 10.1021/ct100728k. Epub 2011 May 25.
All-atom normal mode analysis (NMA) is an efficient way to predict the collective motions in a given macromolecule, which is essential for the understanding of protein biological function and drug design. However, the calculations are limited in time scale mainly because the required diagonalization of the Hessian matrix by Householder-QR transformation is a computationally exhausting task. In this paper, we demonstrate the parallel computing power of the graphics processing unit (GPU) in NMA by mapping Householder-QR transformation onto GPU using Compute Unified Device Architecture (CUDA). The results revealed that the GPU-accelerated all-atom NMA could reduce the runtime of diagonalization significantly and achieved over 20× speedup over CPU-based NMA. In addition, we analyzed the influence of precision on both the performance and the accuracy of GPU. Although the performance of GPU with double precision is weaker than that with single precision in theory, more accurate results and an acceptable speedup of double precision were obtained in our approach by reducing the data transfer time to a minimum. Finally, the inherent drawbacks of GPU and the corresponding solution to deal with the limitation in computational scale are also discussed in this study.
全原子模态分析(NMA)是预测给定大分子中集体运动的有效方法,对于理解蛋白质的生物功能和药物设计至关重要。然而,由于需要通过 Householder-QR 变换对角化 Hessian 矩阵,因此计算时间受到限制。在本文中,我们通过使用 Compute Unified Device Architecture(CUDA)将 Householder-QR 变换映射到 GPU 上来展示 GPU 在 NMA 中的并行计算能力。结果表明,GPU 加速的全原子 NMA 可以显著减少对角化的运行时间,并实现了超过基于 CPU 的 NMA 的 20 倍加速。此外,我们分析了精度对 GPU 性能和准确性的影响。虽然理论上 GPU 的双精度性能比单精度性能弱,但通过将数据传输时间降至最低,我们的方法获得了更准确的结果和可接受的双精度加速。最后,本文还讨论了 GPU 的固有缺陷以及解决计算规模限制的相应解决方案。