Hsu Kuang-Cheng, Wang Pei-Hua, Su Bo-Han, Tseng Yufeng Jane
Department of Computer Science and Information Engineering, National Taiwan University, No. 1, Sec. 4, Roosevelt Rd., Da'an Dist., Taipei City 106319, Taiwan.
Graduate Institute of Biomedical Electronics and Bioinformatics, National Taiwan University, No. 1, Sec. 4, Roosevelt Rd., Da'an Dist., Taipei City 106319, Taiwan.
Brief Bioinform. 2025 Jul 2;26(4). doi: 10.1093/bib/bbaf392.
P-glycoprotein (P-gp), a key member of the ATP-binding cassette (ABC) transporter family, plays a significant role in drug absorption and distribution by binding to diverse xenobiotics and actively transporting them out of cells. Given P-gp's widespread expression, including its critical presence at the blood-brain barrier, identifying whether a compound functions as a P-gp substrate or inhibitor is essential in drug development to evaluate its ability to penetrate the central nervous system. However, most studies on P-gp focus on inhibitor models rather than substrate models. This study presents a robust graph neural network approach to predict P-gp substrates, leveraging graph convolutional networks, AttentiveFP, and an ensemble model. Using a dataset of 1995 drug molecules (1202 substrates, 793 nonsubstrates), AttentiveFP outperformed traditional methods, achieving an ROC-AUC of 0.848 and an accuracy of 0.815. Integrated gradient analysis identified 20 key substructures associated with P-gp substrates. Most noteworthy is that the top four conferring a >70% probability of substrate classification which can be used a quick assessment in the future. This interpretable framework enhances P-gp prediction and broader drug development efforts.
P-糖蛋白(P-gp)是ATP结合盒(ABC)转运蛋白家族的关键成员,通过与多种外源性物质结合并将其主动转运出细胞,在药物吸收和分布中发挥重要作用。鉴于P-gp的广泛表达,包括其在血脑屏障中的关键存在,在药物开发中确定一种化合物是否作为P-gp底物或抑制剂发挥作用对于评估其穿透中枢神经系统的能力至关重要。然而,大多数关于P-gp的研究集中在抑制剂模型而非底物模型上。本研究提出了一种强大的图神经网络方法来预测P-gp底物,利用图卷积网络、AttentiveFP和一个集成模型。使用包含1995个药物分子(1202个底物,793个非底物)的数据集,AttentiveFP优于传统方法,实现了0.848的ROC-AUC和0.815的准确率。综合梯度分析确定了20个与P-gp底物相关的关键子结构。最值得注意的是,排名前四的子结构赋予底物分类的概率>70%,可在未来用于快速评估。这个可解释的框架增强了P-gp预测和更广泛的药物开发工作。