Faculty of Engineering and Physical Sciences, University of Southampton, Southampton SO17 1BJ, United Kingdom.
Department of Computer Science and Information Systems, Teesside University, Middlesbrough TS1 3BX, United Kingdom.
Proc Natl Acad Sci U S A. 2020 Aug 4;117(31):18869-18879. doi: 10.1073/pnas.2002959117. Epub 2020 Jul 16.
Metabolic modeling and machine learning are key components in the emerging next generation of systems and synthetic biology tools, targeting the genotype-phenotype-environment relationship. Rather than being used in isolation, it is becoming clear that their value is maximized when they are combined. However, the potential of integrating these two frameworks for omic data augmentation and integration is largely unexplored. We propose, rigorously assess, and compare machine-learning-based data integration techniques, combining gene expression profiles with computationally generated metabolic flux data to predict yeast cell growth. To this end, we create strain-specific metabolic models for 1,143 mutants and we test 27 machine-learning methods, incorporating state-of-the-art feature selection and multiview learning approaches. We propose a multiview neural network using fluxomic and transcriptomic data, showing that the former increases the predictive accuracy of the latter and reveals functional patterns that are not directly deducible from gene expression alone. We test the proposed neural network on a further 86 strains generated in a different experiment, therefore verifying its robustness to an additional independent dataset. Finally, we show that introducing mechanistic flux features improves the predictions also for knockout strains whose genes were not modeled in the metabolic reconstruction. Our results thus demonstrate that fusing experimental cues with in silico models, based on known biochemistry, can contribute with disjoint information toward biologically informed and interpretable machine learning. Overall, this study provides tools for understanding and manipulating complex phenotypes, increasing both the prediction accuracy and the extent of discernible mechanistic biological insights.
代谢建模和机器学习是新兴下一代系统和合成生物学工具的关键组成部分,针对的是基因型-表型-环境关系。它们不是孤立使用的,而是越来越清楚地表明,当它们结合使用时,其价值才会最大化。然而,将这两个框架集成用于组学数据增强和整合的潜力在很大程度上尚未得到探索。我们提出、严格评估和比较了基于机器学习的数据集成技术,将基因表达谱与计算生成的代谢通量数据相结合,以预测酵母细胞生长。为此,我们为 1143 个突变体创建了特定于菌株的代谢模型,并测试了 27 种机器学习方法,包括最新的特征选择和多视图学习方法。我们提出了一种使用通量组学和转录组学数据的多视图神经网络,表明前者增加了后者的预测准确性,并揭示了仅从基因表达无法直接推断出的功能模式。我们在另一个不同实验中生成的 86 个菌株上测试了所提出的神经网络,从而验证了其对另一个独立数据集的稳健性。最后,我们表明,引入基于已知生物化学的机制通量特征可以提高对代谢重建中未建模基因的敲除菌株的预测,因此证明了融合实验线索与基于已知生物化学的计算模型可以为基于生物学的可解释机器学习提供互补信息。总的来说,这项研究提供了用于理解和操纵复杂表型的工具,提高了预测的准确性和可识别的机制生物学见解的程度。