Kundrotas Petras J, Alexov Emil
Computational Biophysics and Bioinformatics, Department of Physics and Astronomy, Clemson University, Clemson, SC 29634, USA.
Biochim Biophys Acta. 2006 Sep;1764(9):1498-511. doi: 10.1016/j.bbapap.2006.08.002. Epub 2006 Aug 10.
The paper reports a homology based approach for predicting the 3D structures of full length hetero protein complexes. We have created a database of templates that includes structures of hetero protein-protein complexes as well as domain-domain structures (), which allowed us to expand the template pool up to 418 two-chain entries (at 40% sequence identity). Two protocols were tested-a protocol based on position specific Blast search (Protocol-I) and a protocol based on structural similarity of monomers (Protocol-II). All possible combinations of two monomers (350,284 pairs) in the ProtCom database were subjected to both protocols to predict if they form complexes. The predictions were benchmarked against the ProtCom database resulting to false-true positives ratios of approximately 5:1 and approximately 7:1 and recovery of 19% and 86%, respectively for protocols I and II. From 350,284 trials Protocol-I made only approximately 500 wrong predictions resulting to 0.5% error. In addition, though it was shown that artificially created domain-domain structures can in principle be good templates for modeling full length protein complexes, more sensitive methods are needed to detect homology relations. The quality of the models was assessed using two different criteria such as interfacial residues and overall RMSD. It was found that there is no correlation between these two measures. In many cases the interface residues were predicted correctly, but the overall RMSD was over 6 A and vice versa.
本文报道了一种基于同源性的方法来预测全长异源蛋白质复合物的三维结构。我们创建了一个模板数据库,其中包括异源蛋白质-蛋白质复合物的结构以及结构域-结构域结构(),这使我们能够将模板库扩展到418个双链条目(序列同一性为40%)。测试了两种方案——一种基于位置特异性Blast搜索的方案(方案I)和一种基于单体结构相似性的方案(方案II)。ProtCom数据库中两种单体的所有可能组合(350,284对)都经过这两种方案来预测它们是否形成复合物。预测结果以ProtCom数据库为基准,方案I和方案II的假阳性与真阳性比率分别约为5:1和7:1,回收率分别为19%和86%。在350,284次试验中,方案I只做出了大约500次错误预测,错误率为0.5%。此外,虽然已表明人工创建的结构域-结构域结构原则上可以作为模拟全长蛋白质复合物的良好模板,但需要更灵敏的方法来检测同源关系。使用两种不同的标准(如界面残基和整体均方根偏差)评估模型的质量。发现这两种测量方法之间没有相关性。在许多情况下,界面残基被正确预测,但整体均方根偏差超过6 Å,反之亦然。