Suppr超能文献

预测《京都基因与基因组百科全书》中定义的所有通路及相关化合物条目的通路参与情况。

Predicting the Pathway Involvement of All Pathway and Associated Compound Entries Defined in the Kyoto Encyclopedia of Genes and Genomes.

作者信息

Huckvale Erik D, Moseley Hunter N B

机构信息

Markey Cancer Center, University of Kentucky, Lexington, KY 40536, USA.

Superfund Research Center, University of Kentucky, Lexington, KY 40536, USA.

出版信息

Metabolites. 2024 Oct 27;14(11):582. doi: 10.3390/metabo14110582.

Abstract

: Predicting the biochemical pathway involvement of a compound could facilitate the interpretation of biological and biomedical research. Prior prediction approaches have largely focused on metabolism, training machine learning models to solely predict based on metabolic pathways. However, there are many other types of pathways in cells and organisms that are of interest to biologists. : While several publications have made use of the metabolites and metabolic pathways available in the Kyoto Encyclopedia of Genes and Genomes (KEGG), we downloaded all the compound entries with pathway annotations available in the KEGG. From these data, we constructed a dataset where each entry contained features representing compounds combined with features representing pathways, followed by a binary label indicating whether the given compound is associated with the given pathway. We trained multi-layer perceptron binary classifiers on variations of this dataset. : The models trained on 6485 KEGG compounds and 502 pathways scored an overall mean Matthews correlation coefficient (MCC) performance of 0.847, a median MCC of 0.848, and a standard deviation of 0.0098. : This performance on all 502 KEGG pathways represents a roughly 6% improvement over the performance of models trained on only the 184 KEGG metabolic pathways, which had a mean MCC of 0.800 and a standard deviation of 0.021. These results demonstrate the capability to effectively predict biochemical pathways in general, in addition to those specifically related to metabolism. Moreover, the improvement in the performance demonstrates additional transfer learning with the inclusion of non-metabolic pathways.

摘要

预测一种化合物参与的生化途径有助于解释生物学和生物医学研究。先前的预测方法主要集中在代谢方面,训练机器学习模型仅基于代谢途径进行预测。然而,细胞和生物体中还有许多其他类型的途径是生物学家感兴趣的。虽然有几篇出版物利用了京都基因与基因组百科全书(KEGG)中可用的代谢物和代谢途径,但我们下载了KEGG中所有带有途径注释的化合物条目。从这些数据中,我们构建了一个数据集,其中每个条目都包含代表化合物的特征与代表途径的特征,随后是一个二元标签,表明给定化合物是否与给定途径相关。我们在这个数据集的变体上训练了多层感知器二元分类器。在6485种KEGG化合物和502条途径上训练的模型,总体平均马修斯相关系数(MCC)性能为0.847,中位数MCC为0.848,标准差为0.0098。在所有502条KEGG途径上的这种性能比仅在184条KEGG代谢途径上训练的模型性能提高了约6%,后者的平均MCC为0.800,标准差为0.021。这些结果表明,除了与代谢特别相关的途径外,总体上有能力有效预测生化途径。此外,性能的提高表明,纳入非代谢途径可实现额外的迁移学习。

https://cdn.ncbi.nlm.nih.gov/pmc/blobs/9e1c/11596622/0657e0e3002d/metabolites-14-00582-g001.jpg

文献检索

告别复杂PubMed语法,用中文像聊天一样搜索,搜遍4000万医学文献。AI智能推荐,让科研检索更轻松。

立即免费搜索

文件翻译

保留排版,准确专业,支持PDF/Word/PPT等文件格式,支持 12+语言互译。

免费翻译文档

深度研究

AI帮你快速写综述,25分钟生成高质量综述,智能提取关键信息,辅助科研写作。

立即免费体验