Faust Karoline, Croes Didier, van Helden Jacques
Laboratoire de Bioinformatique des Génomes et des Réseaux (BiGRe), Université Libre de Bruxelles, Campus Plaine, CP 263, Bld du Triomphe, B-1050 Bruxelles, Belgium.
J Mol Biol. 2009 May 1;388(2):390-414. doi: 10.1016/j.jmb.2009.03.006. Epub 2009 Mar 10.
Metabolic databases contain information about thousands of small molecules and reactions, which can be represented as networks. In the context of metabolic reconstruction, pathways can be inferred by searching optimal paths in such networks. A recurrent problem is the presence of pool metabolites (e.g., water, energy carriers, and cofactors), which are connected to hundreds of reactions, thus establishing irrelevant shortcuts between nodes of the network. One solution to this problem relies on weighted networks to penalize highly connected compounds. A more refined solution takes the chemical structure of reactants into account in order to differentiate between side and main compounds of a reaction. Thanks to an intensive annotation effort at KEGG, decompositions of reactions into reactant pairs (RPAIR) categorized by their role (main, trans, cofac, ligase, and leave) are now available. The goal of this article is to evaluate the impact of RPAIR data on pathfinding in metabolic networks. To this end, we measure the impact of different parameters concerning the construction of the metabolic network: mapping of reactions and reactant pairs onto a graph, use of selected categories of reactant pairs, weighting schemes for compounds and reactions, removal of highly connected metabolites, and reaction directionality. In total, we tested 104 combinations of parameters and identified their optimal values for pathfinding on the basis of 55 reference pathways from three organisms. The best-performing metabolic network combines the biochemical knowledge encoded by KEGG RPAIR with a weighting scheme penalizing highly connected compounds. With this network, we could recover reference pathways from Escherichia coli with an average accuracy of 93% (32 pathways), from Saccharomyces cerevisiae with an average accuracy of 66% (11 pathways), and from humans with an average accuracy of 70% (12 pathways). Our pathfinding approach is available as part of the Network Analysis Tools.
代谢数据库包含有关数千种小分子和反应的信息,这些信息可以表示为网络。在代谢重建的背景下,可以通过在此类网络中搜索最优路径来推断代谢途径。一个反复出现的问题是存在库代谢物(例如水、能量载体和辅因子),它们与数百个反应相连,从而在网络节点之间建立了不相关的捷径。解决这个问题的一种方法是依靠加权网络来惩罚高度连接的化合物。一种更精细的解决方案是考虑反应物的化学结构,以便区分反应的副产物和主要化合物。由于KEGG进行了大量注释工作,现在可以获得按其作用(主要、转运、辅因子、连接酶和离去)分类的反应分解为反应物对(RPAIR)的信息。本文的目的是评估RPAIR数据对代谢网络中路径查找的影响。为此,我们测量了与代谢网络构建相关的不同参数的影响:反应和反应物对在图上的映射、所选反应物对类别的使用、化合物和反应的加权方案、去除高度连接的代谢物以及反应方向性。我们总共测试了104种参数组合,并根据来自三种生物体的55条参考途径确定了它们在路径查找方面的最佳值。性能最佳的代谢网络将KEGG RPAIR编码的生化知识与惩罚高度连接化合物的加权方案相结合。使用这个网络,我们可以从大肠杆菌中以93%的平均准确率(32条途径)、从酿酒酵母中以66%的平均准确率(11条途径)以及从人类中以70%的平均准确率(12条途径)恢复参考途径。我们的路径查找方法作为网络分析工具的一部分提供。