Tazza Gabriele, Moro Francesco, Ruggeri Dario, Teusink Bas, Vidács László
Department of Software Engineering, University of Szeged, Szeged, Hungary.
Systems Biology Lab, AIMMS/ALIFE, Vrije Universiteit Amsterdam, Amsterdam, the Netherlands.
Comput Struct Biotechnol J. 2025 Aug 7;27:3609-3617. doi: 10.1016/j.csbj.2025.08.004. eCollection 2025.
The understanding of cellular behavior relies on the integration of metabolism and its regulation. Multi-omics data provide a detailed snapshot of the molecular processes underpinning cellular functions and their regulation, describing the current state of the cell. While Machine Learning (ML) models can uncover complex patterns and relationships within these data, they require large datasets for training and often lack interpretability. On the other hand, mathematical models, such as Genome-Scale Metabolic Models (GEMs), offer a structured framework for analyzing the organization and dynamics of specific cellular mechanisms. At the same time, they don't allow for seamless integration of omics information. Recently, a new framework to embed GEMs in a neural network has been introduced: these hybrid models combine the strengths of mechanistic and data-driven approaches, offering a promising platform for integrating different data sources with mechanistic knowledge. In this study, we present a Metabolic-Informed Neural Network (MINN) that utilizes multi-omics data to predict metabolic fluxes in , under different growth rates and gene knockouts. We test its performances against pure ML and parsimonious Flux Balance Analysis (pFBA), demonstrating its efficacy in improving prediction performances. We also highlight how conflicts can emerge between the data-driven and the mechanistic objectives, and we propose different solutions to mitigate them. Finally, we illustrate a strategy to couple the MINN with pFBA, enhancing the interpretability of the solution.
对细胞行为的理解依赖于新陈代谢及其调控的整合。多组学数据提供了支撑细胞功能及其调控的分子过程的详细快照,描述了细胞的当前状态。虽然机器学习(ML)模型可以揭示这些数据中的复杂模式和关系,但它们需要大型数据集进行训练,并且通常缺乏可解释性。另一方面,数学模型,如基因组尺度代谢模型(GEMs),为分析特定细胞机制的组织和动态提供了一个结构化框架。与此同时,它们无法实现组学信息的无缝整合。最近,一种将GEMs嵌入神经网络的新框架被引入:这些混合模型结合了机械方法和数据驱动方法的优势,为将不同数据源与机械知识整合提供了一个有前景的平台。在本研究中,我们提出了一种代谢信息神经网络(MINN),它利用多组学数据来预测不同生长速率和基因敲除条件下的代谢通量。我们将其性能与纯ML和简约通量平衡分析(pFBA)进行了测试,证明了其在提高预测性能方面的有效性。我们还强调了数据驱动目标和机械目标之间如何可能出现冲突,并提出了不同的解决方案来缓解这些冲突。最后,我们阐述了一种将MINN与pFBA相结合的策略,增强了解决方案的可解释性。