Department of Computer Science and Engineering, Indian Institute of Information Technology, Bhopal, MP 462003, India.
Department of Pediatrics and Bioengineering, University of California San Diego, La Jolla, CA 92093, USA.
J Theor Biol. 2024 Feb 7;578:111684. doi: 10.1016/j.jtbi.2023.111684. Epub 2023 Dec 3.
The diverse metabolic pathways are fundamental to all living organisms, as they harvest energy, synthesize biomass components, produce molecules to interact with the microenvironment, and neutralize toxins. While the discovery of new metabolites and pathways continues, the prediction of pathways for new metabolites can be challenging. It can take vast amounts of time to elucidate pathways for new metabolites; thus, according to HMDB (Human Metabolome Database), only 60% of metabolites get assigned to pathways. Here, we present an approach to identify pathways based on metabolite structure. We extracted 201 features from SMILES annotations and identified new metabolites from PubMed abstracts and HMDB. After applying clustering algorithms to both groups of features, we quantified correlations between metabolites, and found the clusters accurately linked 92% of known metabolites to their respective pathways. Thus, this approach could be valuable for predicting metabolic pathways for new metabolites.
不同的代谢途径是所有生物的基础,因为它们可以收获能量、合成生物量成分、产生与微环境相互作用的分子,并中和毒素。虽然新代谢物和途径的发现仍在继续,但预测新代谢物的途径可能具有挑战性。阐明新代谢物的途径可能需要大量的时间;因此,根据 HMDB(人类代谢组数据库),只有 60%的代谢物被分配到途径中。在这里,我们提出了一种基于代谢物结构识别途径的方法。我们从 SMILES 注释中提取了 201 个特征,并从 PubMed 摘要和 HMDB 中识别了新的代谢物。在对两组特征应用聚类算法后,我们量化了代谢物之间的相关性,并发现聚类准确地将 92%的已知代谢物与其各自的途径联系起来。因此,这种方法对于预测新代谢物的代谢途径可能具有重要价值。