Ay Ferhat, Kellis Manolis, Kahveci Tamer
Computer and Information Science and Engineering, University of Florida, Gainesville, Florida, USA.
J Comput Biol. 2011 Mar;18(3):219-35. doi: 10.1089/cmb.2010.0280.
We consider the problem of aligning two metabolic pathways. Unlike traditional approaches, we do not restrict the alignment to one-to-one mappings between the molecules (nodes) of the input pathways (graphs). We follow the observation that, in nature, different organisms can perform the same or similar functions through different sets of reactions and molecules. The number and the topology of the molecules in these alternative sets often vary from one organism to another. With the motivation that an accurate biological alignment should be able to reveal these functionally similar molecule sets across different species, we develop an algorithm that first measures the similarities between different nodes using a mixture of homology and topological similarity. We combine the two metrics by employing an eigenvalue formulation. We then search for an alignment between the two input pathways that maximizes a similarity score, evaluated as the sum of the similarities of the mapped subnetworks of size at most a given integer k, and also does not contain any conflicting mappings. Here we prove that this maximization is NP-hard by a reduction from the maximum weight independent set (MWIS) problem. We then convert our problem to an instance of MWIS and use an efficient vertex-selection strategy to extract the mappings that constitute our alignment. We name our algorithm SubMAP (Subnetwork Mappings in Alignment of Pathways). We evaluate its accuracy and performance on real datasets. Our empirical results demonstrate that SubMAP can identify biologically relevant mappings that are missed by traditional alignment methods. Furthermore, we observe that SubMAP is scalable for metabolic pathways of arbitrary topology, including searching for a query pathway of size 70 against the complete KEGG database of 1,842 pathways. Implementation in C++ is available at http://bioinformatics.cise.ufl.edu/SubMAP.html.
我们考虑两条代谢途径的比对问题。与传统方法不同,我们并不将比对限制在输入途径(图)的分子(节点)之间的一对一映射上。我们基于这样的观察:在自然界中,不同的生物体可以通过不同的反应集和分子来执行相同或相似的功能。这些替代集合中分子的数量和拓扑结构通常因生物体而异。出于准确的生物比对应能够揭示不同物种间这些功能相似分子集合的动机,我们开发了一种算法,该算法首先使用同源性和拓扑相似性的混合来测量不同节点之间的相似性。我们通过采用特征值公式来结合这两个指标。然后,我们在两条输入途径之间搜索一种比对,该比对能使一个相似性得分最大化,该得分被评估为大小至多为给定整数k的映射子网的相似性之和,并且不包含任何冲突的映射。在这里,我们通过从最大权重独立集(MWIS)问题进行归约,证明了这种最大化是NP难的。然后,我们将我们的问题转化为MWIS的一个实例,并使用一种有效的顶点选择策略来提取构成我们比对的映射。我们将我们的算法命名为SubMAP(途径比对中的子网映射)。我们在真实数据集上评估其准确性和性能。我们的实证结果表明,SubMAP能够识别传统比对方法遗漏的生物学相关映射。此外,我们观察到SubMAP对于任意拓扑结构的代谢途径都是可扩展的,包括在包含1842条途径的完整KEGG数据库中搜索大小为70的查询途径。用C++实现的代码可在http://bioinformatics.cise.ufl.edu/SubMAP.html获取。