BMC Bioinformatics. 2013;14 Suppl 16(Suppl 16):S8. doi: 10.1186/1471-2105-14-S16-S8. Epub 2013 Oct 22.
Protein complexes conserved across species indicate processes that are core to cellular machinery (e.g. cell-cycle or DNA damage-repair complexes conserved across human and yeast). While numerous computational methods have been devised to identify complexes from the protein interaction (PPI) networks of individual species, these are severely limited by noise and errors (false positives) in currently available datasets. Our analysis using human and yeast PPI networks revealed that these methods missed several important complexes including those conserved between the two species (e.g. the MLH1-MSH2-PMS2-PCNA mismatch-repair complex). Here, we note that much of the functionalities of yeast complexes have been conserved in human complexes not only through sequence conservation of proteins but also of critical functional domains. Therefore, integrating information of domain conservation might throw further light on conservation patterns between yeast and human complexes.
We identify conserved complexes by constructing an interolog network (IN) leveraging on the functional conservation of proteins between species through domain conservation (from Ensembl) in addition to sequence similarity. We employ 'state-of-the-art' methods to cluster the interolog network, and map these clusters back to the original PPI networks to identify complexes conserved between the species. Evaluation of our IN-based approach (called COCIN) on human and yeast interaction data identifies several additional complexes (76% recall) compared to direct complex detection from the original PINs (54% recall). Our analysis revealed that the IN-construction removes several non-conserved interactions many of which are false positives, thereby improving complex prediction. In fact removing non-conserved interactions from the original PINs also resulted in higher number of conserved complexes, thereby validating our IN-based approach. These complexes included the mismatch repair complex, MLH1-MSH2-PMS2-PCNA, and other important ones namely, RNA polymerase-II, EIF3 and MCM complexes, all of which constitute core cellular processes known to be conserved across the two species.
Our method based on integrating domain conservation and sequence similarity to construct interolog networks helps to identify considerably more conserved complexes between the PPI networks from two species compared to direct complex prediction from the PPI networks. We observe from our experiments that protein complexes are not conserved from yeast to human in a straightforward way, that is, it is not the case that a yeast complex is a (proper) sub-set of a human complex with a few additional proteins present in the human complex. Instead complexes have evolved multifold with considerable re-organization of proteins and re-distribution of their functions across complexes. This finding can have significant implications on attempts to extrapolate other kinds of relationships such as synthetic lethality from yeast to human, for example in the identification of novel cancer targets.
在物种间保守的蛋白质复合物表明,这些过程是细胞机制的核心(例如,在人类和酵母中保守的细胞周期或 DNA 损伤修复复合物)。虽然已经设计了许多计算方法来从单个物种的蛋白质相互作用(PPI)网络中识别复合物,但这些方法受到当前可用数据集噪声和错误(假阳性)的严重限制。我们使用人类和酵母 PPI 网络的分析表明,这些方法错过了几个重要的复合物,包括在两个物种之间保守的复合物(例如,MLH1-MSH2-PMS2-PCNA 错配修复复合物)。在这里,我们注意到,酵母复合物的许多功能不仅通过蛋白质的序列保守性,而且通过关键功能域的保守性在人类复合物中得到了保留。因此,整合域保守信息可能会进一步揭示酵母和人类复合物之间的保守模式。
我们通过构建一个利用物种间蛋白质功能保守性的互作网络(IN)来识别保守的复合物,这种功能保守性是通过域保守性(来自 Ensembl)和序列相似性来实现的。我们采用“最先进”的方法来对互作网络进行聚类,并将这些聚类映射回原始 PPI 网络,以识别物种间保守的复合物。在人类和酵母互作数据上评估我们的基于 IN 的方法(称为 COCIN)与直接从原始 PIN 中检测复合物相比,可识别出更多的复合物(召回率为 76%)。我们的分析表明,IN 的构建去除了许多非保守的相互作用,其中许多是非保守的,从而提高了复合物预测的准确性。事实上,从原始 PIN 中去除非保守的相互作用也导致了更多的保守复合物的产生,从而验证了我们基于 IN 的方法。这些复合物包括错配修复复合物 MLH1-MSH2-PMS2-PCNA,以及其他重要的复合物,如 RNA 聚合酶-II、EIF3 和 MCM 复合物,所有这些复合物都构成了已知在两个物种中保守的核心细胞过程。
我们的方法基于整合域保守性和序列相似性来构建互作网络,有助于在两个物种的 PPI 网络之间识别出更多的保守复合物,而不是直接从 PPI 网络中预测复合物。我们从实验中观察到,蛋白质复合物在从酵母到人类的过程中并不是直接保守的,也就是说,酵母复合物并不是人类复合物的(恰当)子集,只有少数额外的蛋白质存在于人类复合物中。相反,复合物经历了多倍体进化,蛋白质发生了大量重组,其功能在复合物之间重新分配。这一发现可能对试图从酵母推断其他类型的关系(例如合成致死性)到人类产生重大影响,例如在识别新的癌症靶点方面。