Williams M G, Shirai H, Shi J, Nagendra H G, Mueller J, Mizuguchi K, Miguel R N, Lovell S C, Innis C A, Deane C M, Chen L, Campillo N, Burke D F, Blundell T L, de Bakker P I
Department of Biochemistry, University of Cambridge, Cambridge, United Kingdom.
Proteins. 2001;Suppl 5:92-7. doi: 10.1002/prot.1169.
Our approach to fold recognition for the fourth critical assessment of techniques for protein structure prediction (CASP4) experiment involved the use of the FUGUE sequence-structure homology recognition program (http://www-cryst.bioc.cam.ac.uk/fugue), followed by model building. We treat models as hypotheses and examine these to determine whether they explain the available data. Our method depends heavily on environment-specific substitution tables derived from our database of structural alignments of homologous proteins (HOMSTRAD, http://www-cryst.bioc.cam.ac.uk/homstrad/). FUGUE uses these tables to incorporate structural information into profiles created from HOMSTRAD alignments that are matched against a profile created for the target from multiple sequence alignment. In addition, environment-specific substitution tables are used throughout the modeling procedure and as part of the model evaluation. Annotation of sequence alignments with JOY, to reflect local structural features, proved valuable, both for modifying hypotheses, and for rejecting predictions when the expected pattern of conservation is not observed. Our stringency in rejecting incorrect predictions led us to submit a relatively small number of models, including only a low number of false positives, resulting in a high average score.
我们在蛋白质结构预测技术第四次关键评估(CASP4)实验中的折叠识别方法包括使用FUGUE序列-结构同源性识别程序(http://www-cryst.bioc.cam.ac.uk/fugue),随后进行模型构建。我们将模型视为假设,并对其进行检验以确定它们是否能解释现有数据。我们的方法在很大程度上依赖于从我们的同源蛋白质结构比对数据库(HOMSTRAD,http://www-cryst.bioc.cam.ac.uk/homstrad/)中导出的特定环境替换表。FUGUE使用这些表将结构信息纳入从HOMSTRAD比对创建的轮廓中,这些轮廓与从多序列比对为目标创建的轮廓进行匹配。此外,特定环境替换表在整个建模过程中以及作为模型评估的一部分被使用。用JOY对序列比对进行注释以反映局部结构特征,这对于修改假设以及在未观察到预期的保守模式时拒绝预测都被证明是有价值的。我们在拒绝错误预测方面的严格性导致我们提交的模型数量相对较少,包括只有少量的假阳性,从而获得了较高的平均分数。