IEEE/ACM Trans Comput Biol Bioinform. 2020 Nov-Dec;17(6):2040-2052. doi: 10.1109/TCBB.2019.2914050. Epub 2020 Dec 8.
Protein-protein interaction (PPI) network alignment is a canonical operation to transfer biological knowledge among species. The alignment of PPI-networks has many applications, such as the prediction of protein function, detection of conserved network motifs, and the reconstruction of species' phylogenetic relationships. A good multiple-network alignment (MNA), by considering the data related to several species, provides a deep understanding of biological networks and system-level cellular processes. With the massive amounts of available PPI data and the increasing number of known PPI networks, the problem of MNA is gaining more attention in the systems-biology studies. In this paper, we introduce a new scalable and accurate algorithm, called MPGM, for aligning multiple networks. The MPGM algorithm has two main steps: (i) SeedGeneration and (ii) MultiplePercolation. In the first step, to generate an initial set of seed tuples, the SeedGeneration algorithm uses only protein sequence similarities. In the second step, to align remaining unmatched nodes, the MultiplePercolation algorithm uses network structures and the seed tuples generated from the first step. We show that, with respect to different evaluation criteria, MPGM outperforms the other state-of-the-art algorithms. In addition, we guarantee the performance of MPGM under certain classes of network models. We introduce a sampling-based stochastic model for generating k correlated networks. We prove that for this model if a sufficient number of seed tuples are available, the MultiplePercolation algorithm correctly aligns almost all the nodes. Our theoretical results are supported by experimental evaluations over synthetic networks.
蛋白质-蛋白质相互作用(PPI)网络比对是在物种间转移生物知识的标准操作。PPI 网络比对有许多应用,例如预测蛋白质功能、检测保守网络基序以及重建物种的系统发育关系。通过考虑与多个物种相关的数据,良好的多网络比对(MNA)可以深入了解生物网络和系统级细胞过程。随着大量可用的 PPI 数据和越来越多的已知 PPI 网络,MNA 问题在系统生物学研究中受到越来越多的关注。在本文中,我们引入了一种新的可扩展且准确的算法 MPGM,用于对齐多个网络。MPGM 算法有两个主要步骤:(i)SeedGeneration 和(ii)MultiplePercolation。在第一步中,为了生成初始种子元组集,SeedGeneration 算法仅使用蛋白质序列相似度。在第二步中,为了对齐其余未匹配的节点,MultiplePercolation 算法使用网络结构和第一步生成的种子元组。我们表明,根据不同的评估标准,MPGM 优于其他最先进的算法。此外,我们保证 MPGM 在某些网络模型类别下的性能。我们引入了一种基于抽样的随机模型来生成 k 个相关网络。我们证明,对于该模型,如果有足够数量的种子元组可用,MultiplePercolation 算法几乎可以正确对齐所有节点。我们的理论结果得到了对合成网络进行实验评估的支持。