Compugen LTD, Tel Aviv 69512, Israel.
Bioinformatics. 2011 Jul 15;27(14):1941-7. doi: 10.1093/bioinformatics/btr292. Epub 2011 May 17.
Prediction of interactions between protein residues (contact map prediction) can facilitate various aspects of 3D structure modeling. However, the accuracy of ab initio contact prediction is still limited. As structural genomics initiatives move ahead, solved structures of homologous proteins can be used as multiple templates to improve contact prediction of the major conformation of an unsolved target protein. Furthermore, multiple templates may provide a wider view of the protein's conformational space. However, successful usage of multiple structural templates is not straightforward, due to their variable relevance to the target protein, and because of data redundancy issues.
We present here an algorithm that addresses these two limitations in the use of multiple structure templates. First, the algorithm unites contact maps extracted from templates sharing high sequence similarity with each other in a fashion that acknowledges the possibility of multiple conformations. Next, it weights the resulting united maps in inverse proportion to their evolutionary distance from the target protein. Testing this algorithm against CASP8 targets resulted in high precision contact maps. Remarkably, based solely on structural data of remote homologues, our algorithm identified residue-residue interactions that account for all the known conformations of calmodulin, a multifaceted protein. Therefore, employing multiple templates, which improves prediction of contact maps, can also be used to reveal novel conformations. As multiple templates will soon be available for most proteins, our scheme suggests an effective procedure for their optimal consideration.
A Perl script implementing the WMC algorithm described in this article is freely available for academic use at http://tau.ac.il/~haimash/WMC.
预测蛋白质残基之间的相互作用(接触图预测)可以促进 3D 结构建模的各个方面。然而,从头预测接触的准确性仍然有限。随着结构基因组学计划的推进,可以利用同源蛋白质的已解决结构作为多个模板来改进未解决目标蛋白质主要构象的接触预测。此外,多个模板可以提供蛋白质构象空间的更广泛视角。然而,由于模板与目标蛋白质的相关性不同,以及数据冗余问题,成功使用多个结构模板并非易事。
我们在这里提出了一种算法,该算法解决了在使用多个结构模板时的这两个限制。首先,该算法以承认存在多种构象的方式将彼此之间具有高序列相似性的模板中提取的接触图联合在一起。接下来,它以与目标蛋白质的进化距离成反比的方式对生成的联合图进行加权。在 CASP8 目标上测试此算法可得到高精度的接触图。值得注意的是,仅基于远程同源物的结构数据,我们的算法就确定了可解释钙调蛋白(一种多方面的蛋白质)所有已知构象的残基-残基相互作用。因此,使用多个模板可以提高接触图的预测,也可以用来揭示新的构象。由于很快将为大多数蛋白质提供多个模板,因此我们的方案为最佳考虑这些模板提供了一种有效的方法。
用于描述本文中 WMC 算法的 Perl 脚本可在学术上免费使用,网址为 http://tau.ac.il/~haimash/WMC。