Hvidsten Torgeir R, Kryshtafovych Andriy, Komorowski Jan, Fidelis Krzysztof
The Linnaeus Centre for Bioinformatics, Uppsala University, Uppsala, Sweden.
Bioinformatics. 2003 Oct;19 Suppl 2:ii81-91. doi: 10.1093/bioinformatics/btg1064.
Comparative modeling methods can consistently produce reliable structural models for protein sequences with more than 25% sequence identity to proteins with known structure. However, there is a good chance that also sequences with lower sequence identity have their structural components represented in structural databases. To this end, we present a novel fragment-based method using sets of structurally similar local fragments of proteins. The approach differs from other fragment-based methods that use only single backbone fragments. Instead, we use a library of groups containing sets of sequence fragments with geometrically similar local structures and extract sequence related properties to assign these specific geometrical conformations to target sequences. We test the ability of the approach to recognize correct SCOP folds for 273 sequences from the 49 most popular folds. 49% of these sequences have the correct fold as their top prediction, while 82% have the correct fold in one of the top five predictions. Moreover, the approach shows no performance reduction on a subset of sequence targets with less than 10% sequence identity to any protein used to build the library.
比较建模方法能够始终如一地为与已知结构蛋白质序列一致性超过25%的蛋白质序列生成可靠的结构模型。然而,序列一致性较低的序列也很有可能在结构数据库中呈现其结构组成部分。为此,我们提出了一种基于片段的新方法,该方法使用蛋白质结构相似的局部片段集。该方法不同于其他仅使用单个主链片段的基于片段的方法。相反,我们使用一个包含具有几何相似局部结构的序列片段集的组库,并提取与序列相关的属性,以便将这些特定的几何构象分配给目标序列。我们测试了该方法识别来自49种最常见折叠的273个序列的正确SCOP折叠的能力。这些序列中有49%的序列将正确折叠作为其最高预测结果,而82%的序列在前五个预测结果中有正确折叠。此外,对于与用于构建库的任何蛋白质序列一致性低于10%的序列目标子集,该方法的性能没有下降。