Zhao Peng-Cheng, Wei Xue-Xin, Wang Qiong, Wang Hao-Yang, Du Bing-Xue, Li Jia-Ning, Zhu Bei, Yu Hui, Shi Jian-Yu
School of Life Sciences, Northwestern Polytechnical University, Xi'an, 710072, China.
School of Computer Science, Northwestern Polytechnical University, Xi'an, 710072, China.
Interdiscip Sci. 2025 Jan 6. doi: 10.1007/s12539-024-00681-4.
Metabolism in vivo turns small molecules (e.g., drugs) into metabolites (new molecules), which brings unexpected safety issues in drug development. However, it is costly to determine metabolites by biological assays. Recent computational methods provide new promising approaches by predicting possible metabolites. Rule-based methods utilize predefined reaction-derived rules to infer metabolites. However, they are powerless to new metabolic reaction patterns. In contrast, rule-free methods leverage sequence-to-sequence machine translation to generate metabolites. Nevertheless, they are insufficient to characterize molecule structures, and bear weak interpretability. To address these issues in rule-free methods, this manuscript proposes a novel metabolism type-aware graph generative framework (MTGGF) for molecular metabolite prediction. It contains a two-stage learning process, including a pre-training on a large general chemical reaction dataset, and a fine-tuning on three smaller type-specific metabolic reaction datasets. Its core, an elaborate graph-to-graph generative model, treats both atoms and bonds as bipartite vertices, and molecules as bipartite graphs, such that it can embed rich information of molecule structures and ensure the integrity of generated metabolite structures. The comparison with state-of-the-art methods demonstrates its superiority. Furthermore, the ablation study validates the contributions of its two graph encoding components and its reaction-type-specific fine-tuning models. More importantly, based on interactive attention between a molecule and its metabolites, the case studies on five approved drugs reveal that there exist crucial substructures specific to metabolism types. It is anticipated that this framework can boost the risk evaluation of drug metabolites. The codes are available at https://github.com/zpczaizheli/Metabolite .
体内代谢会将小分子(如药物)转化为代谢物(新分子),这在药物研发中带来了意想不到的安全问题。然而,通过生物学检测来确定代谢物成本高昂。最近的计算方法通过预测可能的代谢物提供了新的有前景的途径。基于规则的方法利用预定义的反应衍生规则来推断代谢物。然而,它们对新的代谢反应模式无能为力。相比之下,无规则方法利用序列到序列的机器翻译来生成代谢物。尽管如此,它们在表征分子结构方面不足,且解释性较弱。为了解决无规则方法中的这些问题,本文提出了一种用于分子代谢物预测的新型代谢类型感知图生成框架(MTGGF)。它包含一个两阶段学习过程,包括在一个大型通用化学反应数据集上的预训练,以及在三个较小的特定类型代谢反应数据集上的微调。其核心是一个精心设计的图到图生成模型,将原子和键都视为二分顶点,将分子视为二分图,这样它可以嵌入分子结构的丰富信息并确保生成的代谢物结构的完整性。与现有方法的比较证明了其优越性。此外,消融研究验证了其两个图编码组件及其反应类型特定微调模型的贡献。更重要的是,基于分子与其代谢物之间的交互注意力,对五种已批准药物的案例研究表明存在特定于代谢类型的关键子结构。预计该框架可以提高药物代谢物的风险评估。代码可在https://github.com/zpczaizheli/Metabolite获取。