School of Computer Science, Wuhan University, Bayi Road, Wuhan, 430072, China.
Centre of Quantum Computation and Intelligent Systems, University of Technology, Sydney, Australia.
BMC Genomics. 2018 Sep 24;19(Suppl 7):670. doi: 10.1186/s12864-018-5027-9.
Aligning protein-protein interaction (PPI) networks is very important to discover the functionally conserved sub-structures between different species. In recent years, the global PPI network alignment problem has been extensively studied aiming at finding the one-to-one alignment with the maximum matching score. However, finding large conserved components remains challenging due to its NP-hardness.
We propose a new graph matching method GMAlign for global PPI network alignment. It first selects some pairs of important proteins as seeds, followed by a gradual expansion to obtain an initial matching, and then it refines the current result to obtain an optimal alignment result iteratively based on the vertex cover. We compare GMAlign with the state-of-the-art methods on the PPI network pairs obtained from the largest BioGRID dataset and validate its performance. The results show that our algorithm can produce larger size of alignment, and can find bigger and denser common connected subgraphs as well for the first time. Meanwhile, GMAlign can achieve high quality biological results, as measured by functional consistency and semantic similarity of the Gene Ontology terms. Moreover, we also show that GMAlign can achieve better results which are structurally and biologically meaningful in the detection of large conserved biological pathways between species.
GMAlign is a novel global network alignment tool to discover large conserved functional components between PPI networks. It also has many potential biological applications such as conserved pathway and protein complex discovery across species. The GMAlign software and datasets are avaialbile at https://github.com/yzlwhu/GMAlign .
对齐蛋白质-蛋白质相互作用(PPI)网络对于发现不同物种之间功能保守的亚结构非常重要。近年来,广泛研究了全局 PPI 网络对齐问题,旨在找到具有最大匹配得分的一对一对齐。然而,由于其 NP 难性,发现大的保守组件仍然具有挑战性。
我们提出了一种新的图匹配方法 GMAlign 用于全局 PPI 网络对齐。它首先选择一些对重要蛋白质对作为种子,然后逐步扩展以获得初始匹配,然后基于顶点覆盖迭代地细化当前结果以获得最佳对齐结果。我们将 GMAlign 与来自最大的 BioGRID 数据集的 PPI 网络对的最先进方法进行比较,并验证其性能。结果表明,我们的算法可以产生更大的对齐尺寸,并且可以首次找到更大和更密集的公共连通子图。同时,GMAlign 可以通过基因本体论术语的功能一致性和语义相似性来实现高质量的生物学结果。此外,我们还表明,GMAlign 可以在物种之间检测大的保守生物途径时实现具有结构和生物学意义的更好结果。
GMAlign 是一种用于发现 PPI 网络之间大的保守功能组件的新型全局网络对齐工具。它还具有许多潜在的生物学应用,例如跨物种的保守途径和蛋白质复合物发现。GMAlign 软件和数据集可在 https://github.com/yzlwhu/GMAlign 上获得。