Zhang Yixue, Wu Jialu, Kang Yu, Hou Tingjun
College of Pharmaceutical Sciences, Zhejiang University, Hangzhou, 310058, China.
Polytechnic Institute of Zhejiang University, Zhejiang University, Hangzhou, 310015, China.
J Pharm Anal. 2025 Aug;15(8):101313. doi: 10.1016/j.jpha.2025.101313. Epub 2025 Apr 16.
P-glycoprotein (P-gp) is a transmembrane protein widely involved in the absorption, distribution, metabolism, excretion, and toxicity (ADMET) of drugs within the human body. Accurate prediction of P-gp inhibitors and substrates is crucial for drug discovery and toxicological assessment. However, existing models rely on limited molecular information, leading to suboptimal model performance for predicting P-gp inhibitors and substrates. To overcome this challenge, we compiled an extensive dataset from public databases and literature, consisting of 5,943 P-gp inhibitors and 4,018 substrates, notable for their high quantity, quality, and structural uniqueness. In addition, we curated two external test sets to validate the model's generalization capability. Subsequently, we developed a multimodal graph contrastive learning (GCL) model for the prediction of P-gp inhibitors and substrates (MC-PGP). This framework integrates three types of features from Simplified Molecular Input Line Entry System (SMILES) sequences, molecular fingerprints, and molecular graphs using an attention-based fusion strategy to generate a unified molecular representation. Furthermore, we employed a GCL approach to enhance structural representations by aligning local and global structures. Extensive experimental results highlight the superior performance of MC-PGP, which achieves improvements in the area under the curve of receiver operating characteristic (AUC-ROC) of 9.82% and 10.62% on the external P-gp inhibitor and external P-gp substrate datasets, respectively, compared with 12 state-of-the-art methods. Furthermore, the interpretability analysis of all three molecular feature types offers comprehensive and complementary insights, demonstrating that MC-PGP effectively identifies key functional groups involved in P-gp interactions. These chemically intuitive insights provide valuable guidance for the design and optimization of drug candidates.
P-糖蛋白(P-gp)是一种跨膜蛋白,广泛参与人体内药物的吸收、分布、代谢、排泄和毒性(ADMET)过程。准确预测P-gp抑制剂和底物对于药物发现和毒理学评估至关重要。然而,现有模型依赖有限的分子信息,导致在预测P-gp抑制剂和底物时模型性能欠佳。为克服这一挑战,我们从公共数据库和文献中汇编了一个广泛的数据集,其中包括5943种P-gp抑制剂和4018种底物,其数量、质量和结构独特性都很显著。此外,我们策划了两个外部测试集来验证模型的泛化能力。随后,我们开发了一种用于预测P-gp抑制剂和底物的多模态图对比学习(GCL)模型(MC-PGP)。该框架使用基于注意力的融合策略整合来自简化分子输入线性条目系统(SMILES)序列、分子指纹和分子图的三种类型特征,以生成统一的分子表示。此外,我们采用GCL方法通过对齐局部和全局结构来增强结构表示。大量实验结果突出了MC-PGP的卓越性能,与12种最先进的方法相比,它在外部P-gp抑制剂数据集和外部P-gp底物数据集上分别实现了受试者操作特征曲线下面积(AUC-ROC)提高9.82%和10.62%。此外,对所有三种分子特征类型的可解释性分析提供了全面且互补的见解,表明MC-PGP有效地识别了参与P-gp相互作用的关键官能团。这些具有化学直观性的见解为药物候选物的设计和优化提供了有价值的指导。