Center for Computational Biology and Bioinformatics, College of Engineering, Koc University, Istanbul, Turkey.
Proteins. 2012 Apr;80(4):1239-49. doi: 10.1002/prot.24022. Epub 2012 Jan 31.
The similarity between folding and binding led us to posit the concept that the number of protein-protein interface motifs in nature is limited, and interacting protein pairs can use similar interface architectures repeatedly, even if their global folds completely vary. Thus, known protein-protein interface architectures can be used to model the complexes between two target proteins on the proteome scale, even if their global structures differ. This powerful concept is combined with a flexible refinement and global energy assessment tool. The accuracy of the method is highly dependent on the structural diversity of the interface architectures in the template dataset. Here, we validate this knowledge-based combinatorial method on the Docking Benchmark and show that it efficiently finds high-quality models for benchmark complexes and their binding regions even in the absence of template interfaces having sequence similarity to the targets. Compared to "classical" docking, it is computationally faster; as the number of target proteins increases, the difference becomes more dramatic. Further, it is able to distinguish binders from nonbinders. These features allow performing large-scale network modeling. The results on an independent target set (proteins in the p53 molecular interaction map) show that current method can be used to predict whether a given protein pair interacts. Overall, while constrained by the diversity of the template set, this approach efficiently produces high-quality models of protein-protein complexes. We expect that with the growing number of known interface architectures, this type of knowledge-based methods will be increasingly used by the broad proteomics community.
折叠和结合的相似性使我们假设,自然界中蛋白质-蛋白质界面基序的数量是有限的,相互作用的蛋白质对可以重复使用相似的界面结构,即使它们的整体折叠完全不同。因此,已知的蛋白质-蛋白质界面结构可以用于在蛋白质组范围内对两个靶蛋白之间的复合物进行建模,即使它们的整体结构不同。这个强大的概念与灵活的细化和全局能量评估工具相结合。该方法的准确性高度依赖于模板数据集中界面结构的结构多样性。在这里,我们在对接基准测试中验证了这种基于知识的组合方法,并表明它即使在没有与目标具有序列相似性的模板界面的情况下,也能有效地为基准复合物及其结合区域找到高质量的模型。与“经典”对接相比,它的计算速度更快;随着目标蛋白质数量的增加,差异变得更加明显。此外,它能够区分结合物和非结合物。这些特性允许进行大规模的网络建模。在一个独立的目标集(p53 分子相互作用图谱中的蛋白质)上的结果表明,当前的方法可用于预测给定的蛋白质对是否相互作用。总的来说,尽管受到模板集多样性的限制,但这种方法能够有效地生成蛋白质-蛋白质复合物的高质量模型。我们预计,随着已知界面结构数量的增加,这种基于知识的方法将越来越多地被广泛的蛋白质组学社区使用。