Krumsiek Jan, Suhre Karsten, Illig Thomas, Adamski Jerzy, Theis Fabian J
Institute of Bioinformatics and Systems Biology, Helmholtz Zentrum München, Germany.
BMC Syst Biol. 2011 Jan 31;5:21. doi: 10.1186/1752-0509-5-21.
With the advent of high-throughput targeted metabolic profiling techniques, the question of how to interpret and analyze the resulting vast amount of data becomes more and more important. In this work we address the reconstruction of metabolic reactions from cross-sectional metabolomics data, that is without the requirement for time-resolved measurements or specific system perturbations. Previous studies in this area mainly focused on Pearson correlation coefficients, which however are generally incapable of distinguishing between direct and indirect metabolic interactions.
In our new approach we propose the application of a Gaussian graphical model (GGM), an undirected probabilistic graphical model estimating the conditional dependence between variables. GGMs are based on partial correlation coefficients, that is pairwise Pearson correlation coefficients conditioned against the correlation with all other metabolites. We first demonstrate the general validity of the method and its advantages over regular correlation networks with computer-simulated reaction systems. Then we estimate a GGM on data from a large human population cohort, covering 1020 fasting blood serum samples with 151 quantified metabolites. The GGM is much sparser than the correlation network, shows a modular structure with respect to metabolite classes, and is stable to the choice of samples in the data set. On the example of human fatty acid metabolism, we demonstrate for the first time that high partial correlation coefficients generally correspond to known metabolic reactions. This feature is evaluated both manually by investigating specific pairs of high-scoring metabolites, and then systematically on a literature-curated model of fatty acid synthesis and degradation. Our method detects many known reactions along with possibly novel pathway interactions, representing candidates for further experimental examination.
In summary, we demonstrate strong signatures of intracellular pathways in blood serum data, and provide a valuable tool for the unbiased reconstruction of metabolic reactions from large-scale metabolomics data sets.
随着高通量靶向代谢谱分析技术的出现,如何解释和分析由此产生的大量数据这一问题变得越来越重要。在这项工作中,我们致力于从横断面代谢组学数据重建代谢反应,即无需时间分辨测量或特定系统扰动。该领域以前的研究主要集中在皮尔逊相关系数上,然而,它通常无法区分直接和间接的代谢相互作用。
在我们的新方法中,我们提出应用高斯图形模型(GGM),这是一种估计变量间条件依赖性的无向概率图形模型。GGM基于偏相关系数,即针对与所有其他代谢物的相关性进行条件设定的成对皮尔逊相关系数。我们首先用计算机模拟反应系统证明了该方法的普遍有效性及其相对于常规相关网络的优势。然后,我们根据来自一大群人的队列数据估计了一个GGM,该数据涵盖了1020份空腹血清样本和151种定量代谢物。该GGM比相关网络稀疏得多,在代谢物类别方面呈现模块化结构,并且对数据集中样本的选择具有稳定性。以人类脂肪酸代谢为例,我们首次证明高偏相关系数通常对应于已知的代谢反应。通过研究特定的高分代谢物对,然后在文献整理的脂肪酸合成和降解模型上进行系统评估,对这一特征进行了评估。我们的方法检测到许多已知反应以及可能的新途径相互作用,为进一步的实验研究提供了候选对象。
总之,我们在血清数据中证明了细胞内途径的强烈特征,并为从大规模代谢组学数据集中无偏重建代谢反应提供了一个有价值的工具。