An Yuling, Friesner Richard A
Department of Chemistry and Center for Biomolecular Simulation, Columbia University, New York, New York 10027, USA.
Proteins. 2002 Aug 1;48(2):352-66. doi: 10.1002/prot.10145.
In this work, we introduce a new method for fold recognition using composite secondary structures assembled from different secondary structure prediction servers for a given target sequence. An automatic, complete, and robust way of finding all possible combinations of predicted secondary structure segments (SSS) for the target sequence and clustering them into a few flexible clusters, each containing patterns with the same number of SSS, is developed. This program then takes two steps in choosing plausible homologues: (i) a SSS-based alignment excludes impossible templates whose SSS patterns are very different from any of those of the target; (ii) a residue-based alignment selects good structural templates based on sequence similarity and secondary structure similarity between the target and only those templates left in the first stage. The secondary structure of each residue in the target is selected from one of the predictions to find the best match with the template. Truncation is applied to a target where different predictions vary. In most cases, a target is also divided into N-terminal and C-terminal fragments, each of which is used as a separate subsequence. Our program was tested on the fold recognition targets from CASP3 with known PDB codes and some available targets from CASP4. The results are compared with a structural homologue list for each target produced by the CE program (Shindyalov and Bourne, Protein Eng 1998;11:739-747). The program successfully locates homologues with high Z-score and low root-mean-score deviation within the top 30-50 predictions in the overwhelming majority of cases.
在这项工作中,我们介绍了一种新的折叠识别方法,该方法使用从不同二级结构预测服务器为给定目标序列组装的复合二级结构。我们开发了一种自动、完整且强大的方法,用于找到目标序列预测二级结构片段(SSS)的所有可能组合,并将它们聚类为几个灵活的簇,每个簇包含具有相同数量SSS的模式。然后,该程序在选择合理的同源物时采取两个步骤:(i)基于SSS的比对排除那些SSS模式与目标的任何模式非常不同的不可能的模板;(ii)基于残基的比对基于目标与仅在第一阶段留下的那些模板之间的序列相似性和二级结构相似性选择良好的结构模板。从预测中选择目标中每个残基的二级结构以找到与模板的最佳匹配。对不同预测存在差异的目标应用截断。在大多数情况下,目标也被分为N端和C端片段,每个片段都用作单独的子序列。我们的程序在来自CASP3的具有已知PDB代码的折叠识别目标以及来自CASP4的一些可用目标上进行了测试。将结果与CE程序(Shindyalov和Bourne,Protein Eng 1998;11:739 - 747)为每个目标生成的结构同源物列表进行比较。在绝大多数情况下,该程序成功地在前30 - 50个预测中定位到具有高Z分数和低均方根偏差的同源物。