Scheeff Eric D, Bourne Philip E
San Diego Supercomputer Center, University of California, San Diego, La Jolla, CA 92093-0537, USA.
BMC Bioinformatics. 2006 Sep 14;7:410. doi: 10.1186/1471-2105-7-410.
One of the most powerful methods for the prediction of protein structure from sequence information alone is the iterative construction of profile-type models. Because profiles are built from sequence alignments, the sequences included in the alignment and the method used to align them will be important to the sensitivity of the resulting profile. The inclusion of highly diverse sequences will presumably produce a more powerful profile, but distantly related sequences can be difficult to align accurately using only sequence information. Therefore, it would be expected that the use of protein structure alignments to improve the selection and alignment of diverse sequence homologs might yield improved profiles. However, the actual utility of such an approach has remained unclear.
We explored several iterative protocols for the generation of profile hidden Markov models. These protocols were tailored to allow the inclusion of protein structure alignments in the process, and were used for large-scale creation and benchmarking of structure alignment-enhanced models. We found that models using structure alignments did not provide an overall improvement over sequence-only models for superfamily-level structure predictions. However, the results also revealed that the structure alignment-enhanced models were complimentary to the sequence-only models, particularly at the edge of the "twilight zone". When the two sets of models were combined, they provided improved results over sequence-only models alone. In addition, we found that the beneficial effects of the structure alignment-enhanced models could not be realized if the structure-based alignments were replaced with sequence-based alignments. Our experiments with different iterative protocols for sequence-only models also suggested that simple protocol modifications were unable to yield equivalent improvements to those provided by the structure alignment-enhanced models. Finally, we found that models using structure alignments provided fold-level structure assignments that were superior to those produced by sequence-only models.
When attempting to predict the structure of remote homologs, we advocate a combined approach in which both traditional models and models incorporating structure alignments are used.
仅从序列信息预测蛋白质结构的最有效方法之一是迭代构建轮廓型模型。由于轮廓是基于序列比对构建的,比对中包含的序列及其比对方法对于所得轮廓的敏感性至关重要。纳入高度多样的序列可能会产生更强大的轮廓,但仅使用序列信息很难准确比对远缘相关序列。因此,可以预期利用蛋白质结构比对来改进多样序列同源物的选择和比对可能会产生更好的轮廓。然而,这种方法的实际效用仍不明确。
我们探索了几种用于生成轮廓隐马尔可夫模型的迭代方案。这些方案经过调整,以便在过程中纳入蛋白质结构比对,并用于大规模创建和基准测试结构比对增强模型。我们发现,对于超家族水平的结构预测,使用结构比对的模型并没有比仅使用序列的模型有整体改进。然而,结果也表明,结构比对增强模型与仅使用序列的模型互补,特别是在“ twilight zone”的边缘。当两组模型结合时,它们比仅使用序列的模型提供了更好的结果。此外,我们发现,如果将基于结构的比对替换为基于序列的比对,结构比对增强模型的有益效果就无法实现。我们对仅使用序列模型的不同迭代方案的实验还表明,简单的方案修改无法产生与结构比对增强模型相当的改进。最后,我们发现使用结构比对的模型提供的折叠水平结构分配优于仅使用序列的模型。
在尝试预测远缘同源物的结构时,我们提倡一种结合方法,即同时使用传统模型和纳入结构比对的模型。