Wang Yang, Shi Zanyu, Weerawarna Pathum, Huang Kun, Richardson Timothy, Wang Yijie
Computer Science Department, Luddy School of Informatics, Computing, and Engineering, Indiana University Bloomington, Bloomington, Indiana, USA.
Department of Biostatistics and Health Data Science, School of Medicine, Indiana University, Indianapolis, Indiana, USA.
J Comput Biol. 2025 Jul;32(7):632-645. doi: 10.1089/cmb.2025.0074. Epub 2025 Jun 12.
Explainable Graph Neural Networks have been developed and applied to drug-protein binding prediction to identify the key chemical structures in a drug that have active interactions with the target proteins. However, the key structures identified by the current explainable Graph Neural Network (GNN) models are typically chemically invalid. Furthermore, a threshold must be manually selected to pinpoint the key structures from the rest. To overcome the limitations of the current explainable GNN models, we propose SLGNN, which stands for using Sparse Learning to Graph Neural Networks. It relies on using a chemical-substructure-based graph to represent a drug molecule. Furthermore, SLGNN incorporates generalized fused lasso with message-passing algorithms to identify connected subgraphs that are critical for the drug-protein binding prediction. Due to the use of the chemical-substructure-based graph, it is guaranteed that any subgraphs in a drug identified by SLGNN are chemically valid structures. These structures can be further interpreted as the key chemical structures for the drug to bind to the target protein. Our code is available at https://github.com/yw109iu/Explainable_GNN. We test SLGNN and the state-of-the-art competing methods on three real-world drug-protein binding datasets. We have demonstrated that the key structures identified by our SLGNN are chemically valid and have more predictive power.
可解释图神经网络已被开发并应用于药物 - 蛋白质结合预测,以识别药物中与靶蛋白有活性相互作用的关键化学结构。然而,当前可解释图神经网络(GNN)模型识别出的关键结构通常在化学上是无效的。此外,必须手动选择一个阈值来从其余结构中确定关键结构。为了克服当前可解释GNN模型的局限性,我们提出了SLGNN,即稀疏学习图神经网络。它依赖于使用基于化学子结构的图来表示药物分子。此外,SLGNN将广义融合套索与消息传递算法相结合,以识别对药物 - 蛋白质结合预测至关重要的连通子图。由于使用了基于化学子结构的图,可确保SLGNN识别出的药物中的任何子图都是化学上有效的结构。这些结构可以进一步解释为药物与靶蛋白结合的关键化学结构。我们的代码可在https://github.com/yw109iu/Explainable_GNN获取。我们在三个真实世界的药物 - 蛋白质结合数据集上测试了SLGNN和最先进的竞争方法。我们已经证明,我们的SLGNN识别出的关键结构在化学上是有效的,并且具有更强的预测能力。