Suppr超能文献

预测代谢物在通路类别和单个通路中的通路参与情况。

Predicting the Pathway Involvement of Metabolites in Both Pathway Categories and Individual Pathways.

作者信息

Huckvale Erik D, Moseley Hunter N B

机构信息

Markey Cancer Center, University of Kentucky, Lexington, KY, USA.

Superfund Research Center, University of Kentucky, Lexington, KY, USA.

出版信息

bioRxiv. 2024 Aug 9:2024.08.07.607025. doi: 10.1101/2024.08.07.607025.

Abstract

Metabolism is the network of chemical reactions that sustain cellular life. Parts of this metabolic network are defined as metabolic pathways containing specific biochemical reactions. Products and reactants of these reactions are called metabolites, which are associated with certain human-defined metabolic pathways. Metabolic knowledgebases, such as the Kyoto Encyclopedia of Gene and Genomes (KEGG) contain metabolites, reactions, and pathway annotations; however, such resources are incomplete due to current limits of metabolic knowledge. To fill in missing metabolite pathway annotations, past machine learning models showed some success at predicting KEGG Level 2 pathway category involvement of metabolites based on their chemical structure. Here, we present the first machine learning model to predict metabolite association to more granular KEGG Level 3 metabolic pathways. We used a feature and dataset engineering approach to generate over one million metabolite-pathway entries in the dataset used to train a single binary classifier. This approach produced a mean Matthews correlation coefficient (MCC) of 0.806 ± 0.017 SD across 100 cross-validations iterations. The 172 Level 3 pathways were predicted with an overall MCC of 0.726. Moreover, metabolite association with the 12 Level 2 pathway categories were predicted with an overall MCC of 0.891, representing significant transfer learning from the Level 3 pathway entries. These are the best metabolite-pathway prediction results published so far in the field.

摘要

新陈代谢是维持细胞生命的化学反应网络。这个代谢网络的部分被定义为包含特定生化反应的代谢途径。这些反应的产物和反应物被称为代谢物,它们与某些人为定义的代谢途径相关。代谢知识库,如京都基因与基因组百科全书(KEGG),包含代谢物、反应和途径注释;然而,由于当前代谢知识的局限性,这些资源并不完整。为了填补缺失的代谢物途径注释,过去的机器学习模型在基于代谢物化学结构预测其参与KEGG二级途径类别的方面取得了一些成功。在这里,我们提出了第一个机器学习模型,用于预测代谢物与更细化的KEGG三级代谢途径的关联。我们使用了一种特征和数据集工程方法,在用于训练单个二元分类器的数据集中生成了超过一百万个代谢物-途径条目。在100次交叉验证迭代中,这种方法产生的平均马修斯相关系数(MCC)为0.806±0.017标准差。对172条三级途径的预测总体MCC为0.726。此外,对代谢物与12个二级途径类别的关联预测总体MCC为0.891,这代表了从三级途径条目中进行的显著迁移学习。这些是该领域目前已发表的最佳代谢物-途径预测结果。

https://cdn.ncbi.nlm.nih.gov/pmc/blobs/1d56/11326255/f3bca32900bb/nihpp-2024.08.07.607025v1-f0001.jpg

文献AI研究员

20分钟写一篇综述,助力文献阅读效率提升50倍。

立即体验

用中文搜PubMed

大模型驱动的PubMed中文搜索引擎

马上搜索

文档翻译

学术文献翻译模型,支持多种主流文档格式。

立即体验