Májek Peter, Elber Ron
Department of Computer Science, Upson Hall 4130, Cornell University, Ithaca, New York 14853-7501, USA.
Proteins. 2009 Sep;76(4):822-36. doi: 10.1002/prot.22388.
A coarse-grained potential for protein simulations and fold ranking is presented. The potential is based on a two-point model of individual amino acids and a specific implementation of hydrogen bonding. Parameters are determined for distance dependent pair interactions, pseudo bonds, angles, and torsions. A scaling factor for a hydrogen bonding term is also determined. Iterative sampling for 4867 proteins reproduces distributions of internal coordinates and distances observed in the Protein Data Bank. The adjustment of the potential and resampling are in the spirit of the generalized ensemble approach. No native structure information (e.g., secondary structure) is used in the calculation of the potential or in the simulation of a particular protein. The potential is subject to two tests as follows: (i) simulations of 956 globular proteins in the neighborhood of their native folds (these proteins were not used in the training set) and (ii) discrimination between native and decoy structures for 2470 proteins with 305,000 decoys and the "Decoys 'R' Us" dataset. In the first test, 58% of tested proteins stay within 5 A from the native fold in Molecular Dynamics simulations of more than 20 nanoseconds using the new potential. The potential is also useful in differentiating between correct and approximate folds providing significant signal for structure prediction algorithms. Sampling with the potential consistently regenerates the distribution of distances and internal coordinates it learned. Nevertheless, during Molecular Dynamics simulations structures are found that reproduce the learned distributions but are far from the native fold.
提出了一种用于蛋白质模拟和折叠排序的粗粒度势函数。该势函数基于单个氨基酸的两点模型以及氢键的特定实现方式。确定了距离依赖对相互作用、伪键、角度和扭转的参数。还确定了氢键项的缩放因子。对4867种蛋白质进行迭代采样,可重现蛋白质数据库中观察到的内部坐标和距离分布。势函数的调整和重新采样遵循广义系综方法的精神。在势函数计算或特定蛋白质模拟中未使用天然结构信息(例如二级结构)。该势函数接受如下两项测试:(i)对956种球状蛋白质在其天然折叠附近进行模拟(这些蛋白质未用于训练集),以及(ii)使用“Decoys 'R' Us”数据集对2470种蛋白质及其305,000个诱饵结构进行天然结构与诱饵结构的区分。在第一项测试中,使用新势函数在超过20纳秒的分子动力学模拟中,58%的测试蛋白质与天然折叠的距离保持在5埃以内。该势函数在区分正确折叠和近似折叠方面也很有用,为结构预测算法提供了显著信号。使用该势函数进行采样能够持续重现其学习到的距离和内部坐标分布。然而,在分子动力学模拟过程中,发现一些结构虽然重现了学习到的分布,但却与天然折叠相差甚远。