Kihara D, Lu H, Kolinski A, Skolnick J
Laboratory of Computational Genomics, Donald Danforth Plant Science Center, 893 North Warson Road, St. Louis, MO 63141, USA.
Proc Natl Acad Sci U S A. 2001 Aug 28;98(18):10125-30. doi: 10.1073/pnas.181328398. Epub 2001 Aug 14.
The successful prediction of protein structure from amino acid sequence requires two features: an efficient conformational search algorithm and an energy function with a global minimum in the native state. As a step toward addressing both issues, a threading-based method of secondary and tertiary restraint prediction has been developed and applied to ab initio folding. Such restraints are derived by extracting consensus contacts and local secondary structure from at least weakly scoring structures that, in some cases, can lack any global similarity to the sequence of interest. Furthermore, to generate representative protein structures, a reduced lattice-based protein model is used with replica exchange Monte Carlo to explore conformational space. We report results on the application of this methodology, termed TOUCHSTONE, to 65 proteins whose lengths range from 39 to 146 residues. For 47 (40) proteins, a cluster centroid whose rms deviation from native is below 6.5 (5) A is found in one of the five lowest energy centroids. The number of correctly predicted proteins increases to 50 when atomic detail is added and a knowledge-based atomic potential is combined with clustered and nonclustered structures for candidate selection. The combination of the ratio of the relative number of contacts to the protein length and the number of clusters generated by the folding algorithm is a reliable indicator of the likelihood of successful fold prediction, thereby opening the way for genome-scale ab initio folding.
一个高效的构象搜索算法和一个在天然状态下具有全局最小值的能量函数。作为解决这两个问题的一步,一种基于穿线法的二级和三级约束预测方法已经被开发出来并应用于从头折叠。这种约束是通过从至少一些弱评分结构中提取共识接触和局部二级结构而得到的,在某些情况下,这些结构可能与目标序列缺乏任何全局相似性。此外,为了生成代表性的蛋白质结构,使用了基于简化晶格的蛋白质模型和复制交换蒙特卡罗方法来探索构象空间。我们报告了这种称为TOUCHSTONE的方法应用于65种长度从39到146个残基的蛋白质的结果。对于47(40)种蛋白质,在五个最低能量聚类中心之一中发现了一个与天然结构的均方根偏差低于6.5(5)埃的聚类中心。当添加原子细节并将基于知识的原子势与聚类和非聚类结构结合用于候选选择时,正确预测的蛋白质数量增加到50种。接触相对数量与蛋白质长度的比率以及折叠算法生成的聚类数量的组合是成功折叠预测可能性的可靠指标,从而为基因组规模的从头折叠开辟了道路。