Cooke Juliette, Wieder Cecilia, Poupin Nathalie, Frainay Clément, Ebbels Timothy, Jourdan Fabien
Toxalim (Research Centre in Food Toxicology), Université de Toulouse, INRAE, ENVT, INP-Purpan, UPS, Toulouse, France.
Section of Bioinformatics, Division of Systems Medicine, Department of Metabolism, Digestion, and Reproduction, Faculty of Medicine, Imperial College London, London, United Kingdom.
Metabolomics. 2025 Sep 9;21(5):136. doi: 10.1007/s11306-025-02335-y.
Initially developed for transcriptomics data, pathway analysis (PA) methods can introduce biases when applied to metabolomics data, especially if input parameters are not chosen with care. This is particularly true for exometabolomics data, where there can be many metabolic steps between the measured exported metabolites in the profile and internal disruptions in the organism. However, evaluating PA methods experimentally is practically impossible when the sample's "true" metabolic disruption is unknown.
This study aims to show that PA can lead to non-specific enrichment, potentially resulting in false assumptions about the true cause of perturbed metabolic states.
Using in silico metabolic modelling, we can create disruptions in metabolic networks. SAMBA, a constraint-based modelling approach, simulates metabolic profiles for entire pathway knockouts, providing both a known disruption site as well as a simulated metabolic profile for PA methods. PA should be able to detect the known disrupted pathway among the significantly enriched pathways for that profile.
Through network-level statistics, visualisation, and graph-based metrics, we show that even when a given pathway is completely blocked, it may not be significantly enriched when using PA methods with its corresponding simulated metabolic profile. This can be due to various reasons such as the chosen PA method, the initial pathway set definition, or the network's inherent structure.
This work highlights how some metabolomics data may not be suited to typical PA methods, and serves as a benchmark for analysing, improving and potentially developing new PA tools.
通路分析(PA)方法最初是为转录组学数据开发的,应用于代谢组学数据时可能会引入偏差,尤其是在输入参数选择不谨慎的情况下。对于外代谢组学数据而言尤其如此,在这类数据中,所测量的输出代谢物谱与生物体内部干扰之间可能存在许多代谢步骤。然而,当样本的“真实”代谢干扰未知时,通过实验评估PA方法实际上是不可能的。
本研究旨在表明PA可能导致非特异性富集,从而可能对代谢状态受扰的真正原因产生错误假设。
使用计算机模拟代谢建模,我们可以在代谢网络中制造干扰。SAMBA是一种基于约束的建模方法,它模拟整个通路敲除后的代谢谱,为PA方法提供一个已知的干扰位点以及一个模拟的代谢谱。PA应该能够在该谱显著富集的通路中检测出已知的受干扰通路。
通过网络层面的统计、可视化和基于图的指标,我们表明,即使给定的通路被完全阻断,在使用PA方法及其相应的模拟代谢谱时,它可能也不会显著富集。这可能是由于多种原因造成的,比如所选的PA方法、初始通路集定义或网络的固有结构。
这项工作突出了一些代谢组学数据可能不适合典型PA方法的情况,并为分析、改进以及潜在地开发新的PA工具提供了一个基准。