Srihari Sriganesh, Ning Kang, Leong Hon Wai
School of Computing, National University of Singapore, Singapore 117590, Singapore.
Genome Inform. 2009 Oct;23(1):159-68.
Protein complexes are responsible for most of vital biological processes within the cell. Understanding the machinery behind these biological processes requires detection and analysis of complexes and their constituent proteins. A wealth of computational approaches towards detection of complexes deal with clustering of protein-protein interaction (PPI) networks. Among these clustering approaches, the Markov Clustering (MCL) algorithm has proved to be reasonably successful, mainly due to its scalability and robustness. However, MCL produces many noisy clusters, which either do not represent any known complexes or have additional proteins (noise) that reduce the accuracies of correctly predicted complexes. Consequently, the accuracies of these clusters when matched with known complexes are quite low. Refinement of these clusters to improve the accuracy requires deeper understanding of the organization of complexes. Recently, experiments on yeast by Gavin et al. (2006) revealed that proteins within a complex are organized in two parts: core and attachment. Based on these insights, we propose our method (MCL-CA), which couples core-attachment based refinement steps to refine the clusters produced by MCL. We evaluated the effectiveness of our approach on two different datasets and compared the quality of our predicted complexes with that produced by MCL. The results show that our approach significantly improves the accuracies of predicted complexes when matched with known complexes. A direct result of this is that MCL-CA is able to cover larger number of known complexes than MCL. Further, we also compare our method with two very recently proposed methods CORE and COACH, which also capitalize on the core-attachment structure. We also discuss several instances to show that our predicted complexes clearly adhere to the core-attachment structure as revealed by Gavin et al.
蛋白质复合物负责细胞内大部分重要的生物过程。要理解这些生物过程背后的机制,需要检测和分析复合物及其组成蛋白质。大量用于检测复合物的计算方法都涉及蛋白质 - 蛋白质相互作用(PPI)网络的聚类。在这些聚类方法中,马尔可夫聚类(MCL)算法已被证明相当成功,主要是因为它具有可扩展性和鲁棒性。然而,MCL会产生许多噪声簇,这些簇要么不代表任何已知的复合物,要么包含额外的蛋白质(噪声),从而降低了正确预测复合物的准确性。因此,当这些簇与已知复合物匹配时,其准确性相当低。要提高这些簇的准确性以进行优化,需要更深入地了解复合物的组织方式。最近,加文等人(2006年)对酵母进行的实验表明,复合物中的蛋白质分为两部分:核心部分和附着部分。基于这些见解,我们提出了我们的方法(MCL - CA),该方法结合了基于核心 - 附着的优化步骤来优化MCL产生的簇。我们在两个不同的数据集上评估了我们方法的有效性,并将我们预测的复合物的质量与MCL产生的复合物的质量进行了比较。结果表明,与已知复合物匹配时,我们的方法显著提高了预测复合物的准确性。由此直接产生的结果是,MCL - CA能够比MCL覆盖更多数量的已知复合物。此外,我们还将我们的方法与最近提出的两种方法CORE和COACH进行了比较,这两种方法也利用了核心 - 附着结构。我们还讨论了几个实例,以表明我们预测的复合物明显符合加文等人所揭示的核心 - 附着结构。