Zhu Jiang, Zhu Qianqian, Shi Yunyu, Liu Haiyan
Key Laboratory of Structural Biology, University of Science and Technology of China, Chinese Academy of Sciences, School of Life Sciences, Hefei, Anhui, 230026, China.
Proteins. 2003 Sep 1;52(4):598-608. doi: 10.1002/prot.10444.
One strategy for ab initio protein structure prediction is to generate a large number of possible structures (decoys) and select the most fitting ones based on a scoring or free energy function. The conformational space of a protein is huge, and chances are rare that any heuristically generated structure will directly fall in the neighborhood of the native structure. It is desirable that, instead of being thrown away, the unfitting decoy structures can provide insights into native structures so prediction can be made progressively. First, we demonstrate that a recently parameterized physics-based effective free energy function based on the GROMOS96 force field and a generalized Born/surface area solvent model is, as several other physics-based and knowledge-based models, capable of distinguishing native structures from decoy structures for a number of widely used decoy databases. Second, we observe a substantial increase in correlations of the effective free energies with the degree of similarity between the decoys and the native structure, if the similarity is measured by the content of native inter-residue contacts in a decoy structure rather than its root-mean-square deviation from the native structure. Finally, we investigate the possibility of predicting native contacts based on the frequency of occurrence of contacts in decoy structures. For most proteins contained in the decoy databases, a meaningful amount of native contacts can be predicted based on plain frequencies of occurrence at a relatively high level of accuracy. Relative to using plain frequencies, overwhelming improvements in sensitivity of the predictions are observed for the 4_state_reduced decoy sets by applying energy-dependent weighting of decoy structures in determining the frequency. There, approximately 80% native contacts can be predicted at an accuracy of approximately 80% using energy-weighted frequencies. The sensitivity of the plain frequency approach is much lower (20% to 40%). Such improvements are, however, not observed for the other decoy databases. The rationalization and implications of the results are discussed.
从头开始进行蛋白质结构预测的一种策略是生成大量可能的结构(诱饵结构),并根据评分或自由能函数选择最合适的结构。蛋白质的构象空间非常大,通过启发式方法生成的任何结构直接落在天然结构附近的可能性都很小。理想的情况是,不合适的诱饵结构不应被丢弃,而是能够为天然结构提供见解,从而逐步进行预测。首先,我们证明,基于GROMOS96力场和广义玻恩/表面积溶剂模型的最近参数化的基于物理的有效自由能函数,与其他一些基于物理和基于知识的模型一样,能够从多个广泛使用的诱饵数据库中区分天然结构和诱饵结构。其次,我们观察到,如果用诱饵结构中天然残基间接触的含量而非其与天然结构的均方根偏差来衡量相似性,有效自由能与诱饵结构和天然结构之间的相似程度的相关性会大幅增加。最后,我们研究了基于诱饵结构中接触出现频率来预测天然接触的可能性。对于诱饵数据库中包含的大多数蛋白质,可以基于相对较高的准确率下的简单出现频率预测出数量可观的天然接触。相对于使用简单频率,通过在确定频率时对诱饵结构应用能量依赖加权,在4_state_reduced诱饵集上观察到预测灵敏度有了压倒性的提高。在那里,使用能量加权频率可以以大约80%的准确率预测大约80%的天然接触。简单频率方法的灵敏度要低得多(20%至40%)。然而,在其他诱饵数据库中未观察到这种改进。我们讨论了结果的合理性及其含义。