State Key Laboratory of Microbial Metabolism, and School of Life Sciences and Biotechnology, Shanghai Jiao Tong University, Shanghai, China.
Chongqing key Laboratory of Oral Diseases and Biomedical Sciences, Chongqing Municipal Key Laboratory of Oral Biomedical Engineering of Higher Education, and College of Stomatology, Chongqing Medical University, Chongqing, China.
Sci Rep. 2017 Aug 22;7(1):9048. doi: 10.1038/s41598-017-08793-2.
With the rapid growth of micro-organism metabolic networks, acquiring the intracellular concentration of microorganisms' metabolites accurately in large-batch is critical to the development of metabolic engineering and synthetic biology. Complementary to the experimental methods, computational methods were used as effective assessing tools for the studies of intracellular concentrations of metabolites. In this study, the dataset of 130 metabolites from E. coli and S. cerevisiae with available experimental concentrations were utilized to develop a SVM model of the negative logarithm of the concentration (-logC). In this statistic model, in addition to common descriptors of molecular properties, two special types of descriptors including metabolic network topologic descriptors and metabolic pathway descriptors were included. All 1997 descriptors were finally reduced into 14 by variable selections including genetic algorithm (GA). The model was evaluated through internal validations by 10-fold and leave-one-out (LOO) cross-validation, as well as external validations by predicting -logC values of the test set. The developed SVM model is robust and has a strong predictive potential (n = 91, m = 14, R = 0.744, RMSE = 0.730, Q = 0.57; R = 0.59, RMSE = 0.702, Q = 0.58). An effective tool could be provided by this analysis for the large-batch prediction of the intracellular concentrations of the micro-organisms' metabolites.
随着微生物代谢网络的快速发展,在大规模生产中准确获取微生物代谢物的细胞内浓度对于代谢工程和合成生物学的发展至关重要。除了实验方法外,计算方法还被用作研究代谢物细胞内浓度的有效评估工具。在本研究中,利用大肠杆菌和酿酒酵母的 130 种代谢物的数据集,这些代谢物的实验浓度是可用的,开发了一种代谢物浓度的负对数(-logC)的 SVM 模型。在这个统计模型中,除了常见的分子性质描述符外,还包括代谢网络拓扑描述符和代谢途径描述符两种特殊类型的描述符。通过包括遗传算法(GA)在内的变量选择,将所有 1997 个描述符最终减少到 14 个。该模型通过 10 倍交叉验证和留一法(LOO)内部验证以及通过预测测试集的-logC 值进行外部验证进行评估。开发的 SVM 模型稳健且具有强大的预测潜力(n=91,m=14,R=0.744,RMSE=0.730,Q=0.57;R=0.59,RMSE=0.702,Q=0.58)。通过该分析,可以为微生物代谢物的细胞内浓度的大规模预测提供有效的工具。