Vendruscolo M, Najmanovich R, Domany E
Department of Physics of Complex Systems, Weizmann Institute of Science, Rehovot, Israel.
Proteins. 2000 Feb 1;38(2):134-48. doi: 10.1002/(sici)1097-0134(20000201)38:2<134::aid-prot3>3.0.co;2-a.
We present a method to derive contact energy parameters from large sets of proteins. The basic requirement on which our method is based is that for each protein in the database the native contact map has lower energy than all its decoy conformations that are obtained by threading. Only when this condition is satisfied one can use the proposed energy function for fold identification. Such a set of parameters can be found (by perceptron learning) if Mp, the number of proteins in the database, is not too large. Other aspects that influence the existence of such a solution are the exact definition of contact and the value of the critical distance Rc, below which two residues are considered to be in contact. Another important novel feature of our approach is its ability to determine whether an energy function of some suitable proposed form can or cannot be parameterized in a way that satisfies our basic requirement. As a demonstration of this, we determine the region in the (Rc, Mp) plane in which the problem is solvable, i.e., we can find a set of contact parameters that stabilize simultaneously all the native conformations. We show that for large enough databases the contact approximation to the energy cannot stabilize all the native folds even against the decoys obtained by gapless threading.
我们提出了一种从大量蛋白质中推导接触能参数的方法。我们方法所基于的基本要求是,对于数据库中的每个蛋白质,其天然接触图的能量低于通过穿线法获得的所有诱饵构象的能量。只有当这个条件满足时,才能使用所提出的能量函数进行折叠识别。如果数据库中蛋白质的数量(M_p)不是太大,那么这样一组参数可以(通过感知器学习)找到。影响这种解决方案存在的其他因素是接触的精确定义以及临界距离(R_c)的值,低于该值的两个残基被认为处于接触状态。我们方法的另一个重要新颖之处在于它能够确定某种合适的提议形式的能量函数是否能够以满足我们基本要求的方式进行参数化。作为对此的证明,我们确定了((R_c, M_p))平面中问题可解的区域,即我们可以找到一组同时稳定所有天然构象的接触参数。我们表明,对于足够大的数据库,即使与通过无间隙穿线获得的诱饵相比,能量的接触近似也无法稳定所有天然折叠。