Murzin A G, Bateman A
Center for Protein Engineering, MRC Center, Cambridge, United Kingdom.
Proteins. 1997;Suppl 1:105-12. doi: 10.1002/(sici)1097-0134(1997)1+<105::aid-prot14>3.3.co;2-1.
Protein structure prediction is arguably the biggest unsolved problem of structural biology. The notion of the number of naturally occurring different protein folds being limited allows partial solution of this problem by the use of fold recognition methods, which "thread" the sequence in question through a library of known protein folds. The fold recognition methods were thought to be superior to the distant homology recognition methods when there is no significant sequence similarity to known structures. We show here that the Structural Classification of Proteins (SCOP) database, organizing all known protein folds according their structural and evolutionary relationships, can be effectively used to enhance the sensitivity of the distant homology recognition methods to rival the "threading" methods. In the CASP2 experiment, our approach correctly assigned into existing SCOP superfamilies all of the six "fold recognition" targets we attempted. For each of the six targets, we correctly predicted the homologous protein with a very similar structure; often, it was the most similar structure. We correctly predicted local alignments of the sequence features that we found to be characteristic for the protein superfamily containing a given target. Our global alignments, extended manually from these local alignments, also appeared to be rather accurate.
蛋白质结构预测可以说是结构生物学中最大的未解决问题。天然存在的不同蛋白质折叠数量有限这一概念使得通过使用折叠识别方法来部分解决这个问题成为可能,这些方法将所讨论的序列“穿线”通过已知蛋白质折叠的库。当与已知结构没有显著的序列相似性时,折叠识别方法被认为优于远源同源识别方法。我们在此表明,蛋白质结构分类(SCOP)数据库根据其结构和进化关系组织所有已知蛋白质折叠,可以有效地用于提高远源同源识别方法的灵敏度,以与“穿线”方法相媲美。在CASP2实验中,我们的方法将我们尝试的六个“折叠识别”目标全部正确地归入现有的SCOP超家族。对于这六个目标中的每一个,我们都正确地预测了具有非常相似结构的同源蛋白质;通常,它是最相似的结构。我们正确地预测了我们发现对于包含给定目标的蛋白质超家族具有特征性的序列特征的局部比对。我们从这些局部比对手动扩展得到的全局比对似乎也相当准确。