School of Life Sciences, Northwestern Polytechnical University, Xi'an 710072, China.
Department of Computer Science, The University of Hong Kong, Hong Kong 999077, China.
Bioinformatics. 2022 Jun 24;38(Suppl 1):i325-i332. doi: 10.1093/bioinformatics/btac222.
MOTIVATION: During lead compound optimization, it is crucial to identify pathways where a drug-like compound is metabolized. Recently, machine learning-based methods have achieved inspiring progress to predict potential metabolic pathways for drug-like compounds. However, they neglect the knowledge that metabolic pathways are dependent on each other. Moreover, they are inadequate to elucidate why compounds participate in specific pathways. RESULTS: To address these issues, we propose a novel Multi-Label Graph Learning framework of Metabolic Pathway prediction boosted by pathway interdependence, called MLGL-MP, which contains a compound encoder, a pathway encoder and a multi-label predictor. The compound encoder learns compound embedding representations by graph neural networks. After constructing a pathway dependence graph by re-trained word embeddings and pathway co-occurrences, the pathway encoder learns pathway embeddings by graph convolutional networks. Moreover, after adapting the compound embedding space into the pathway embedding space, the multi-label predictor measures the proximity of two spaces to discriminate which pathways a compound participates in. The comparison with state-of-the-art methods on KEGG pathways demonstrates the superiority of our MLGL-MP. Also, the ablation studies reveal how its three components contribute to the model, including the pathway dependence, the adapter between compound embeddings and pathway embeddings, as well as the pre-training strategy. Furthermore, a case study illustrates the interpretability of MLGL-MP by indicating crucial substructures in a compound, which are significantly associated with the attending metabolic pathways. It is anticipated that this work can boost metabolic pathway predictions in drug discovery. AVAILABILITY AND IMPLEMENTATION: The code and data underlying this article are freely available at https://github.com/dubingxue/MLGL-MP.
动机:在先导化合物优化过程中,识别药物样化合物代谢的途径至关重要。最近,基于机器学习的方法在预测药物样化合物的潜在代谢途径方面取得了令人鼓舞的进展。然而,它们忽略了代谢途径相互依赖的知识。此外,它们不足以阐明为什么化合物参与特定的途径。
结果:为了解决这些问题,我们提出了一种新的基于代谢途径预测的多标签图学习框架,称为 MLGL-MP,它包含化合物编码器、途径编码器和多标签预测器。化合物编码器通过图神经网络学习化合物嵌入表示。通过重新训练的词嵌入和途径共现构建途径依赖图后,途径编码器通过图卷积网络学习途径嵌入。此外,在将化合物嵌入空间适配到途径嵌入空间之后,多标签预测器测量两个空间的接近程度以区分化合物参与的途径。KEGG 途径上与最先进方法的比较证明了我们的 MLGL-MP 的优越性。此外,消融研究揭示了其三个组件如何为模型做出贡献,包括途径依赖、化合物嵌入和途径嵌入之间的适配器以及预训练策略。此外,通过指出与参与的代谢途径显著相关的化合物中的关键子结构,案例研究说明了 MLGL-MP 的可解释性。预计这项工作可以促进药物发现中的代谢途径预测。
可用性和实现:本文所依据的代码和数据可在 https://github.com/dubingxue/MLGL-MP 上免费获取。
Bioinformatics. 2023-8-1
BMC Bioinformatics. 2022-9-28
Brief Bioinform. 2023-9-22
Bioinformatics. 2020-4-15
Bioinformatics. 2022-4-12
IEEE Trans Vis Comput Graph. 2022-6
Nat Commun. 2025-1-18
BMC Bioinformatics. 2023-9-19
Bioinformatics. 2021-5-23
BMC Genomics. 2020-10-27
Bioinformatics. 2020-4-15
Br J Cancer. 2019-12-10
Comb Chem High Throughput Screen. 2018
Acta Pharm Sin B. 2018-9
Bioinformatics. 2017-12-15