Department of Computing Science, University of Alberta, Edmonton, Alberta, Canada T6G 2E8.
Department of Biological Science, University of Alberta, Edmonton, Alberta, Canada T6G 2E9.
J Chem Inf Model. 2021 Jun 28;61(6):3128-3140. doi: 10.1021/acs.jcim.1c00144. Epub 2021 May 26.
metabolism prediction is a cheminformatic task of autonomously predicting the set of metabolic byproducts produced from a specified molecule and a set of enzymes or reactions. Here, we describe a novel machine learned cytochrome P450 (CYP450) metabolism prediction suite, called CyProduct, that accurately predicts metabolic byproducts for a specified molecule and a human CYP450 isoform. It includes three modules: (1) CypReact, a tool that predicts if the query compound reacts with a given CYP450 enzyme, (2) CypBoM, a tool that accurately predicts the "bond site" of the reaction (i.e., which specific bonds within the query molecule react with the CYP isoform), and (3) MetaboGen, a tool that generates the metabolic byproducts based on CypBoM's bond-site prediction. CyProduct predicts metabolic biotransformation products for each of the nine most important human CYP450 enzymes. CypBoM uses an important new concept called "bond of metabolism" (BoM), which extends the traditional "site of metabolism" (SoM) by specifying the information about the set of chemical that is modified or formed in a metabolic reaction (rather than the specific atom). We created a BoM database for 1845 CYP450-mediated Phase I reactions, then used this to train the CypBoM Predictor to predict the reactive bond locations on substrate molecules. CypBoM Predictor's cross-validated Jaccard score for reactive bond prediction ranged from 0.380 to 0.452 over the nine CYP450 enzymes. Over variants of a test set of 68 known CYP450 substrates and 30 nonreactants, CyProduct outperformed the other packages, including ADMET Predictor, BioTransformer, and GLORY, by an average of 200% (with respect to Jaccard score) in terms of predicting metabolites. The CyProduct suite and the data sets are freely available at https://bitbucket.org/wishartlab/cyproduct/src/master/.
代谢预测是一项自主预测特定分子和一组酶或反应所产生的代谢副产物集的计算化学任务。在这里,我们描述了一种新的基于机器学习的细胞色素 P450 (CYP450) 代谢预测套件,称为 CyProduct,它可以准确预测指定分子和人类 CYP450 同工酶的代谢副产物。它包括三个模块:(1)CypReact,一种预测查询化合物是否与给定 CYP450 酶反应的工具;(2)CypBoM,一种准确预测反应“键位”(即查询分子中的哪些特定键与 CYP 同工酶反应)的工具;(3)MetaboGen,一种基于 CypBoM 的键位预测生成代谢副产物的工具。CyProduct 预测了九种最重要的人类 CYP450 酶中的每一种的代谢生物转化产物。CypBoM 使用了一个名为“代谢键”(BoM)的新概念,该概念通过指定在代谢反应中被修饰或形成的一组化学物质的信息,扩展了传统的“代谢部位”(SoM)(而不是特定的原子)。我们创建了一个包含 1845 种 CYP450 介导的 I 相反应的 BoM 数据库,然后使用该数据库来训练 CypBoM Predictor 以预测底物分子上的反应键位置。CypBoM Predictor 在九个 CYP450 酶中的交叉验证 Jaccard 得分范围为 0.380 至 0.452。在一组 68 种已知 CYP450 底物和 30 种非反应底物的测试集变体上,CyProduct 在预测代谢物方面的表现优于其他包,包括 ADMET Predictor、BioTransformer 和 GLORY,平均高出 200%(相对于 Jaccard 得分)。CyProduct 套件和数据集可在 https://bitbucket.org/wishartlab/cyproduct/src/master/ 免费获取。