Kim Wan Kyu, Henschel Andreas, Winter Christof, Schroeder Michael
Bioinformatics Group, Biotechnological Centre, Technische Universität Dresden, Dresden, Germany.
PLoS Comput Biol. 2006 Sep 29;2(9):e124. doi: 10.1371/journal.pcbi.0020124. Epub 2006 Jul 31.
A systematic classification of protein-protein interfaces is a valuable resource for understanding the principles of molecular recognition and for modelling protein complexes. Here, we present a classification of domain interfaces according to their geometry. Our new algorithm uses a hybrid approach of both sequential and structural features. The accuracy is evaluated on a hand-curated dataset of 416 interfaces. Our hybrid procedure achieves 83% precision and 95% recall, which improves the earlier sequence-based method by 5% on both terms. We classify virtually all domain interfaces of known structure, which results in nearly 6,000 distinct types of interfaces. In 40% of the cases, the interacting domain families associate in multiple orientations, suggesting that all the possible binding orientations need to be explored for modelling multidomain proteins and protein complexes. In general, hub proteins are shown to use distinct surface regions (multiple faces) for interactions with different partners. Our classification provides a convenient framework to query genuine gene fusion, which conserves binding orientation in both fused and separate forms. The result suggests that the binding orientations are not conserved in at least one-third of the gene fusion cases detected by a conventional sequence similarity search. We show that any evolutionary analysis on interfaces can be skewed by multiple binding orientations and multiple interaction partners. The taxonomic distribution of interface types suggests that ancient interfaces common to the three major kingdoms of life are enriched by symmetric homodimers. The classification results are online at http://www.scoppi.org.
蛋白质-蛋白质界面的系统分类对于理解分子识别原理和模拟蛋白质复合物而言是一项宝贵的资源。在此,我们根据结构对结构域界面进行了分类。我们的新算法采用了序列特征和结构特征相结合的方法。在一个由416个界面组成的人工挑选的数据集中对准确性进行了评估。我们的混合方法实现了83%的精确率和95%的召回率,在这两个指标上比早期基于序列的方法都提高了5%。我们对几乎所有已知结构的结构域界面进行了分类,结果得到了近6000种不同类型的界面。在40%的情况下,相互作用的结构域家族以多种取向结合,这表明在模拟多结构域蛋白质和蛋白质复合物时需要探索所有可能的结合取向。一般来说,中心蛋白被证明会使用不同的表面区域(多个面)与不同的伙伴进行相互作用。我们的分类提供了一个方便的框架来查询真正的基因融合,这种融合在融合形式和分离形式中都保留了结合取向。结果表明,在通过传统序列相似性搜索检测到的至少三分之一的基因融合案例中,结合取向并不保守。我们表明,对界面进行的任何进化分析都可能因多种结合取向和多个相互作用伙伴而产生偏差。界面类型按分类法的分布表明,生命三大王国共有的古老界面在对称同二聚体中更为丰富。分类结果可在http://www.scoppi.org上在线获取。