Fischer D, Eisenberg D
UCLA-DOE Laboratory of Structural Biology & Molecular Medicine, Molecular Biology Institute 90095-1570, USA.
Protein Sci. 1996 May;5(5):947-55. doi: 10.1002/pro.5560050516.
In protein fold recognition, one assigns a probe amino acid sequence of unknown structure to one of a library of target 3D structures. Correct assignment depends on effective scoring of the probe sequence for its compatibility with each of the target structures. Here we show that, in addition to the amino acid sequence of the probe, sequence-derived properties of the probe sequence (such as the predicted secondary structure) are useful in fold assignment. The additional measure of compatibility between probe and target is the level of agreement between the predicted secondary structure of the probe and the known secondary structure of the target fold. That is, we recommend a sequence-structure compatibility function that combines previously developed compatibility functions (such as the 3D-1D scores of Bowie et al. [1991] or sequence-sequence replacement tables) with the predicted secondary structure of the probe sequence. The effect on fold assignment of adding predicted secondary structure is evaluated here by using a benchmark set of proteins (Fischer et al., 1996a). The 3D structures of the probe sequences of the benchmark are actually known, but are ignored by our method. The results show that the inclusion of the predicted secondary structure improves fold assignment by about 25%. The results also show that, if the true secondary structure of the probe were known, correct fold assignment would increase by an additional 8-32%. We conclude that incorporating sequence-derived predictions significantly improves assignment of sequences to known 3D folds. Finally, we apply the new method to assign folds to sequences in the SWISSPROT database; six fold assignments are given that are not detectable by standard sequence-sequence comparison methods; for two of these, the fold is known from X-ray crystallography and the fold assignment is correct.
在蛋白质折叠识别中,要将一个未知结构的探测氨基酸序列与一个目标三维结构库中的某一个进行匹配。正确的匹配取决于对探测序列与每个目标结构兼容性的有效评分。在此我们表明,除了探测序列的氨基酸序列外,探测序列的衍生性质(如预测的二级结构)在折叠匹配中也很有用。探测序列与目标结构之间兼容性的额外衡量标准是探测序列预测的二级结构与目标折叠已知二级结构之间的一致程度。也就是说,我们推荐一种序列 - 结构兼容性函数,该函数将先前开发的兼容性函数(如Bowie等人[1991]的三维 - 一维评分或序列 - 序列替换表)与探测序列的预测二级结构相结合。这里通过使用一组蛋白质基准集(Fischer等人,1996a)来评估添加预测二级结构对折叠匹配的影响。基准集中探测序列的三维结构实际上是已知的,但我们的方法忽略了它们。结果表明,纳入预测的二级结构可使折叠匹配的准确率提高约25%。结果还表明,如果探测序列的真实二级结构已知,正确的折叠匹配率将额外提高8 - 32%。我们得出结论,纳入序列衍生预测能显著提高将序列匹配到已知三维折叠的准确率。最后,我们将新方法应用于SWISSPROT数据库中序列的折叠匹配;给出了六个标准序列 - 序列比较方法无法检测到的折叠匹配;其中两个的折叠结构通过X射线晶体学已知且折叠匹配正确。