Toyota Technological Institute at Chicago, 6045 S Kenwood, Chicago, Illinois 60637, USA.
Proteins. 2011 Jun;79(6):1930-9. doi: 10.1002/prot.23016. Epub 2011 Apr 4.
Most threading methods predict the structure of a protein using only a single template. Due to the increasing number of solved structures, a protein without solved structure is very likely to have more than one similar template structures. Therefore, a natural question to ask is if we can improve modeling accuracy using multiple templates. This article describes a new multiple-template threading method to answer this question. At the heart of this multiple-template threading method is a novel probabilistic-consistency algorithm that can accurately align a single protein sequence simultaneously to multiple templates. Experimental results indicate that our multiple-template method can improve pairwise sequence-template alignment accuracy and generate models with better quality than single-template models even if they are built from the best single templates (P-value <10(-6)) while many popular multiple sequence/structure alignment tools fail to do so. The underlying reason is that our probabilistic-consistency algorithm can generate accurate multiple sequence/template alignments. In another word, without an accurate multiple sequence/template alignment, the modeling accuracy cannot be improved by simply using multiple templates to increase alignment coverage. Blindly tested on the CASP9 targets with more than one good template structures, our method outperforms all other CASP9 servers except two (Zhang-Server and QUARK of the same group). Our probabilistic-consistency algorithm can possibly be extended to align multiple protein/RNA sequences and structures.
大多数的序列比对方法在预测蛋白质结构时仅使用单一模板。由于已经解析出越来越多的结构,因此那些尚未解析结构的蛋白质很可能具有多个相似的模板结构。因此,一个自然的问题是,我们是否可以使用多个模板来提高建模的准确性。本文描述了一种新的多模板序列比对方法来回答这个问题。这种多模板序列比对方法的核心是一种新颖的概率一致性算法,该算法可以准确地将单个蛋白质序列同时比对到多个模板上。实验结果表明,即使我们使用最佳的单一模板(P 值 < 10(-6))构建模型,我们的多模板方法也可以提高两两序列-模板比对的准确性,并生成比单一模板模型质量更好的模型,而许多流行的多序列/结构比对工具都无法做到这一点。其根本原因在于我们的概率一致性算法可以生成准确的多序列/模板比对。换句话说,如果没有准确的多序列/模板比对,那么仅通过使用多个模板来增加比对覆盖率,建模的准确性也无法得到提高。在 CASP9 目标上进行了盲目测试,这些目标都具有一个以上的良好模板结构,我们的方法除了两个(同组的 Zhang-Server 和 QUARK)之外,优于所有其他 CASP9 服务器。我们的概率一致性算法有可能扩展到对齐多个蛋白质/RNA 序列和结构。