Liwo Adam, Ołdziej Stanisław, Czaplewski Cezary, Kleinerman Dana S, Blood Philip, Scheraga Harold A
Faculty of Chemistry, University of Gdańsk, Sobieskiego 18, 80-952 Gdańsk, Poland.
J Chem Theory Comput. 2010 Mar 9;6(3):890-909. doi: 10.1021/ct9004068.
We report the implementation of our united-residue UNRES force field for simulations of protein structure and dynamics with massively parallel architectures. In addition to coarse-grained parallelism already implemented in our previous work, in which each conformation was treated by a different task, we introduce a fine-grained level in which energy and gradient evaluation are split between several tasks. The Message Passing Interface (MPI) libraries have been utilized to construct the parallel code. The parallel performance of the code has been tested on a professional Beowulf cluster (Xeon Quad Core), a Cray XT3 supercomputer, and two IBM BlueGene/P supercomputers with canonical and replica-exchange molecular dynamics. With IBM BlueGene/P, about 50 % efficiency and 120-fold speed-up of the fine-grained part was achieved for a single trajectory of a 767-residue protein with use of 256 processors/trajectory. Because of averaging over the fast degrees of freedom, UNRES provides an effective 1000-fold speed-up compared to the experimental time scale and, therefore, enables us to effectively carry out millisecond-scale simulations of proteins with 500 and more amino-acid residues in days of wall-clock time.
我们报告了用于蛋白质结构和动力学模拟的联合残基 UNRES 力场在大规模并行架构上的实现。除了我们之前工作中已经实现的粗粒度并行性(其中每个构象由不同任务处理)之外,我们还引入了细粒度级别,其中能量和梯度评估在多个任务之间进行划分。消息传递接口(MPI)库已被用于构建并行代码。该代码的并行性能已在专业的 Beowulf 集群(至强四核)、Cray XT3 超级计算机以及两台具有正则和副本交换分子动力学的 IBM BlueGene/P 超级计算机上进行了测试。对于一个 767 残基的蛋白质单轨迹,使用 256 个处理器/轨迹,在 IBM BlueGene/P 上实现了约 50%的效率和细粒度部分 120 倍的加速。由于对快速自由度进行了平均,与实验时间尺度相比,UNRES 提供了有效的 1000 倍加速,因此,使我们能够在数天的实际时间内有效地对具有 500 个及更多氨基酸残基的蛋白质进行毫秒级模拟。