Shanghai Frontiers Science Center of Optogenetic Techniques for Cell Metabolism, Shanghai Key Laboratory of New Drug Design, School of Pharmacy, East China University of Science and Technology, 130 Meilong Road, Shanghai 200237, China.
Chem Res Toxicol. 2024 Sep 16;37(9):1535-1548. doi: 10.1021/acs.chemrestox.4c00199. Epub 2024 Aug 28.
Cytochromes P450 (P450s or CYPs) are the most important phase I metabolic enzymes in the human body and are responsible for metabolizing ∼75% of the clinically used drugs. P450-mediated metabolism is also closely associated with the formation of toxic metabolites and drug-drug interactions. Therefore, it is of high importance to predict if a compound is the substrate of a given P450 in the early stage of drug development. In this study, we built the multitask learning models to simultaneously predict the substrates of five major drug-metabolizing P450 enzymes, namely, CYP3A4, 2C9, 2C19, 2D6, and 1A2, based on the collected substrate data sets. Compared to the single-task model and conventional machine learning models, the multitask fingerprints and graph neural networks model achieved superior performance with the average AUC values of 90.8% on the test set. Notably, the multitask model demonstrated its good performance on the small amount of substrate data sets such as CYP1A2, 2C9, and 2C19. In addition, the Shapley additive explanation and the attention mechanism were used to reveal specific substructures associated with P450 substrates, which were further confirmed and complemented by the substructure mining tool and the literature.
细胞色素 P450(P450 或 CYP)是人体内最重要的 I 相代谢酶,负责代谢约 75%的临床应用药物。P450 介导的代谢也与有毒代谢物的形成和药物相互作用密切相关。因此,在药物开发的早期阶段预测化合物是否为特定 P450 的底物具有重要意义。在这项研究中,我们构建了多任务学习模型,基于收集的底物数据集,同时预测五种主要药物代谢 P450 酶(CYP3A4、2C9、2C19、2D6 和 1A2)的底物。与单任务模型和传统机器学习模型相比,多任务指纹和图神经网络模型在测试集上的平均 AUC 值达到 90.8%,表现出优异的性能。值得注意的是,该多任务模型在 CYP1A2、2C9 和 2C19 等少量底物数据集上表现出良好的性能。此外,还使用 Shapley 加法解释和注意力机制揭示了与 P450 底物相关的特定子结构,并用子结构挖掘工具和文献进一步证实和补充了这些子结构。