Wang Yijie, Jeong Hyundoo, Yoon Byung-Jun, Qian Xiaoning
School of Informatics, Computing and Engineering, Indiana University, Bloomington, 47405, IN, USA.
Department of Mechatronics Engineering, Incheon National University, Incheon, 22012, South Korea.
BMC Genomics. 2020 Nov 18;21(Suppl 10):615. doi: 10.1186/s12864-020-07010-1.
The current computational methods on identifying conserved protein complexes across multiple Protein-Protein Interaction (PPI) networks suffer from the lack of explicit modeling of the desired topological properties within conserved protein complexes as well as their scalability.
To overcome those issues, we propose a scalable algorithm-ClusterM-for identifying conserved protein complexes across multiple PPI networks through the integration of network topology and protein sequence similarity information. ClusterM overcomes the computational barrier that existed in previous methods, where the complexity escalates exponentially when handling an increasing number of PPI networks; and it is able to detect conserved protein complexes with both topological separability and cohesive protein sequence conservation. On two independent compendiums of PPI networks from Saccharomyces cerevisiae (Sce, yeast), Drosophila melanogaster (Dme, fruit fly), Caenorhabditis elegans (Cel, worm), and Homo sapiens (Hsa, human), we demonstrate that ClusterM outperforms other state-of-the-art algorithms by a significant margin and is able to identify de novo conserved protein complexes across four species that are missed by existing algorithms.
ClusterM can better capture the desired topological property of a typical conserved protein complex, which is densely connected within the complex while being well-separated from the rest of the networks. Furthermore, our experiments have shown that ClusterM is highly scalable and efficient when analyzing multiple PPI networks.
当前用于识别多个蛋白质-蛋白质相互作用(PPI)网络中保守蛋白质复合物的计算方法,存在缺乏对保守蛋白质复合物中所需拓扑特性进行显式建模以及可扩展性不足的问题。
为克服这些问题,我们提出了一种可扩展算法ClusterM,用于通过整合网络拓扑和蛋白质序列相似性信息来识别多个PPI网络中的保守蛋白质复合物。ClusterM克服了先前方法中存在的计算障碍,即在处理越来越多的PPI网络时复杂度呈指数级增长;并且它能够检测具有拓扑可分离性和凝聚性蛋白质序列保守性的保守蛋白质复合物。在来自酿酒酵母(Sce,酵母)、黑腹果蝇(Dme,果蝇)、秀丽隐杆线虫(Cel,线虫)和智人(Hsa,人类)的两个独立的PPI网络汇编数据集上,我们证明ClusterM显著优于其他现有算法,并且能够识别现有算法遗漏的跨四个物种的全新保守蛋白质复合物。
ClusterM能够更好地捕捉典型保守蛋白质复合物所需的拓扑特性,即复合物内部紧密连接,同时与网络的其余部分良好分离。此外,我们的实验表明,ClusterM在分析多个PPI网络时具有高度的可扩展性和效率。