Aloy Patrick, Oliva Baldo
Institut de Recerca Biomèdica and Barcelona Supercomputing Center, 10-12 08028 Barcelona, Catalonia, Spain.
BMC Struct Biol. 2009 Nov 16;9:71. doi: 10.1186/1472-6807-9-71.
Recent advances on high-throughput technologies have produced a vast amount of protein sequences, while the number of high-resolution structures has seen a limited increase. This has impelled the production of many strategies to built protein structures from its sequence, generating a considerable amount of alternative models. The selection of the closest model to the native conformation has thus become crucial for structure prediction. Several methods have been developed to score protein models by energies, knowledge-based potentials and combination of both.
Here, we present and demonstrate a theory to split the knowledge-based potentials in scoring terms biologically meaningful and to combine them in new scores to predict near-native structures. Our strategy allows circumventing the problem of defining the reference state. In this approach we give the proof for a simple and linear application that can be further improved by optimizing the combination of Zscores. Using the simplest composite score (ZECbeta) we obtained predictions similar to state-of-the-art methods. Besides, our approach has the advantage of identifying the most relevant terms involved in the stability of the protein structure. Finally, we also use the composite Zscores to assess the conformation of models and to detect local errors.
We have introduced a method to split knowledge-based potentials and to solve the problem of defining a reference state. The new scores have detected near-native structures as accurately as state-of-art methods and have been successful to identify wrongly modeled regions of many near-native conformations.
高通量技术的最新进展产生了大量的蛋白质序列,而高分辨率结构的数量增长有限。这促使人们开发了许多从序列构建蛋白质结构的策略,产生了大量的替代模型。因此,选择与天然构象最接近的模型对于结构预测至关重要。已经开发了几种方法,通过能量、基于知识的势以及两者的组合来对蛋白质模型进行评分。
在此,我们提出并证明了一种理论,即将基于知识的势分解为具有生物学意义的评分项,并将它们组合成新的分数以预测接近天然的结构。我们的策略可以规避定义参考状态的问题。在这种方法中,我们为一个简单的线性应用提供了证明,该应用可以通过优化Z分数的组合进一步改进。使用最简单的复合分数(ZECbeta),我们获得了与最先进方法相似的预测结果。此外,我们的方法具有识别蛋白质结构稳定性中最相关项的优势。最后,我们还使用复合Z分数来评估模型的构象并检测局部错误。
我们引入了一种分解基于知识的势并解决定义参考状态问题的方法。新的分数能够像最先进的方法一样准确地检测接近天然的结构,并且成功地识别了许多接近天然构象中建模错误的区域。