Wu Stephen Gang, Wang Yuxuan, Jiang Wu, Oyetunde Tolutola, Yao Ruilian, Zhang Xuehong, Shimizu Kazuyuki, Tang Yinjie J, Bao Forrest Sheng
Department of Energy, Environmental and Chemical Engineering, Washington University in St. Louis, St. Louis, Missouri, United States of America.
Department of Computer Science and Engineering, Ohio State University, Columbus, Ohio, United States of America.
PLoS Comput Biol. 2016 Apr 19;12(4):e1004838. doi: 10.1371/journal.pcbi.1004838. eCollection 2016 Apr.
13C metabolic flux analysis (13C-MFA) has been widely used to measure in vivo enzyme reaction rates (i.e., metabolic flux) in microorganisms. Mining the relationship between environmental and genetic factors and metabolic fluxes hidden in existing fluxomic data will lead to predictive models that can significantly accelerate flux quantification. In this paper, we present a web-based platform MFlux (http://mflux.org) that predicts the bacterial central metabolism via machine learning, leveraging data from approximately 100 13C-MFA papers on heterotrophic bacterial metabolisms. Three machine learning methods, namely Support Vector Machine (SVM), k-Nearest Neighbors (k-NN), and Decision Tree, were employed to study the sophisticated relationship between influential factors and metabolic fluxes. We performed a grid search of the best parameter set for each algorithm and verified their performance through 10-fold cross validations. SVM yields the highest accuracy among all three algorithms. Further, we employed quadratic programming to adjust flux profiles to satisfy stoichiometric constraints. Multiple case studies have shown that MFlux can reasonably predict fluxomes as a function of bacterial species, substrate types, growth rate, oxygen conditions, and cultivation methods. Due to the interest of studying model organism under particular carbon sources, bias of fluxome in the dataset may limit the applicability of machine learning models. This problem can be resolved after more papers on 13C-MFA are published for non-model species.
13C代谢通量分析(13C-MFA)已被广泛用于测量微生物体内的酶反应速率(即代谢通量)。挖掘环境和遗传因素与现有通量组学数据中隐藏的代谢通量之间的关系,将产生能够显著加速通量定量的预测模型。在本文中,我们展示了一个基于网络的平台MFlux(http://mflux.org),它通过机器学习预测细菌的中心代谢,利用来自约100篇关于异养细菌代谢的13C-MFA论文的数据。采用了三种机器学习方法,即支持向量机(SVM)、k近邻(k-NN)和决策树,来研究影响因素与代谢通量之间的复杂关系。我们对每种算法的最佳参数集进行了网格搜索,并通过10折交叉验证来验证它们的性能。在所有三种算法中,SVM的准确率最高。此外,我们采用二次规划来调整通量分布以满足化学计量约束。多个案例研究表明,MFlux可以根据细菌种类、底物类型、生长速率、氧气条件和培养方法合理地预测通量组。由于在特定碳源下研究模式生物的兴趣,数据集中通量组的偏差可能会限制机器学习模型的适用性。在发表更多关于非模式物种的13C-MFA论文后,这个问题可以得到解决。