Department of Organic and Inorganic Chemistry, University of the Basque Country UPV/EHU, and Basque Center for Biophysics CSIC-UPV/EHU, Leioa 48940, Great Bilbao, Biscay, Basque Country, Spain.
Department of Systems and Computer Engineering, Carleton University, K1S 5B6, Ottawa, ON, Canada.
Curr Top Med Chem. 2021;21(9):819-827. doi: 10.2174/1568026621666210331161144.
Checking the connectivity (structure) of complex Metabolic Reaction Networks (MRNs) models proposed for new microorganisms with promising properties is an important goal for chemical biology.
In principle, we can perform a hand-on checking (Manual Curation). However, this is a challenging task due to the high number of combinations of pairs of nodes (possible metabolic reactions).
The CPTML linear model obtained using the LDA algorithm is able to discriminate nodes (metabolites) with the correct assignation of reactions from incorrect nodes with values of accuracy, specificity, and sensitivity in the range of 85-100% in both training and external validation data series.
In this work, we used Combinatorial Perturbation Theory and Machine Learning techniques to seek a CPTML model for MRNs >40 organisms compiled by Barabasis' group. First, we quantified the local structure of a very large set of nodes in each MRN using a new class of node index called Markov linear indices fk. Next, we calculated CPT operators for 150000 combinations of query and reference nodes of MRNs. Last, we used these CPT operators as inputs of different ML algorithms.
Meanwhile, PTML models based on Bayesian network, J48-Decision Tree and Random Forest algorithms were identified as the three best non-linear models with accuracy greater than 97.5%. The present work opens the door to the study of MRNs of multiple organisms using PTML models.
检查具有良好特性的新微生物的复杂代谢反应网络 (MRN) 模型的连通性(结构)是化学生物学的一个重要目标。
原则上,我们可以进行手动检查(手动校对)。然而,由于节点对(可能的代谢反应)组合数量众多,因此这是一项具有挑战性的任务。
使用 LDA 算法获得的 CPTML 线性模型能够区分具有正确反应分配的节点(代谢物)和具有不正确节点的节点,其在训练和外部验证数据系列中的准确性、特异性和敏感性值在 85-100%范围内。
在这项工作中,我们使用组合微扰理论和机器学习技术来寻找由 Barabasis 小组编译的 >40 个生物体的 MRN 的 CPTML 模型。首先,我们使用一种称为马尔可夫线性指数 fk 的新节点指数对每个 MRN 中的大量节点的局部结构进行量化。接下来,我们计算了 150000 个 MRN 查询节点和参考节点组合的 CPT 算子。最后,我们将这些 CPT 算子作为不同 ML 算法的输入。
同时,基于贝叶斯网络、J48-决策树和随机森林算法的 PTML 模型被确定为准确性超过 97.5%的三个最佳非线性模型。本工作为使用 PTML 模型研究多个生物体的 MRN 开辟了道路。