Vijayakumar Supreeta, Rahman Pattanathu K S M, Angione Claudio
Department of Computer Science and Information Systems, Teesside University, Middlesbrough, North Yorkshire TS1 3BX, UK.
Centre for Enzyme Innovation, Institute of Biological and Biomedical Sciences, School of Biological Sciences, University of Portsmouth, Portsmouth, Hampshire PO1 2UP, UK.
iScience. 2020 Nov 18;23(12):101818. doi: 10.1016/j.isci.2020.101818. eCollection 2020 Dec 18.
Machine learning has recently emerged as a promising tool for inferring multi-omic relationships in biological systems. At the same time, genome-scale metabolic models (GSMMs) can be integrated with such multi-omic data to refine phenotypic predictions. In this work, we use a multi-omic machine learning pipeline to analyze a GSMM of sp. PCC 7002, a cyanobacterium with large potential to produce renewable biofuels. We use regularized flux balance analysis to observe flux response between conditions across photosynthesis and energy metabolism. We then incorporate principal-component analysis, -means clustering, and LASSO regularization to reduce dimensionality and extract key cross-omic features. Our results suggest that combining metabolic modeling with machine learning elucidates mechanisms used by cyanobacteria to cope with fluctuations in light intensity and salinity that cannot be detected using transcriptomics alone. Furthermore, GSMMs introduce critical mechanistic details that improve the performance of omic-based machine learning methods.
机器学习最近已成为推断生物系统中多组学关系的一种有前景的工具。与此同时,基因组规模代谢模型(GSMMs)可以与此类多组学数据整合,以优化表型预测。在这项工作中,我们使用一个多组学机器学习流程来分析集胞藻属PCC 7002(一种具有生产可再生生物燃料巨大潜力的蓝细菌)的GSMM。我们使用正则化通量平衡分析来观察光合作用和能量代谢条件之间的通量响应。然后,我们纳入主成分分析、K均值聚类和套索正则化以降低维度并提取关键的跨组学特征。我们的结果表明,将代谢建模与机器学习相结合能够阐明蓝细菌应对光照强度和盐度波动所使用的机制,而仅使用转录组学无法检测到这些机制。此外,GSMMs引入了关键的机制细节,提高了基于组学的机器学习方法的性能。