Li Dan, Chen Lei, Li Youyong, Tian Sheng, Sun Huiyong, Hou Tingjun
College of Pharmaceutical Sciences, Zhejiang University , Hangzhou, Zhejiang 310058, China.
Mol Pharm. 2014 Mar 3;11(3):716-26. doi: 10.1021/mp400450m. Epub 2014 Feb 18.
P-glycoprotein (P-gp) actively transports a wide variety of chemically diverse compounds out of cells. It is highly associated with the ADMET properties of drugs and drug candidates and, moreover, plays a major role in the multidrug resistance (MDR) phenomenon, which leads to the failure of chemotherapy in cancer treatments. Therefore, the recognition of potential P-gp substrates at the early stages of the drug discovery process is quite important. Here, we compiled an extensive data set containing 423 P-gp substrates and 399 nonsubstrates, which is the largest P-gp substrate/nonsubstrate data set yet published. Comparison of the distributions of eight important physicochemical properties for the substrates and nonsubstrates reveals that molecular weight and molecular solubility are the informative attributes differentiating P-gp substrates from nonsubstrates. Examination of the distributions of eight physicochemical properties for 735 P-gp inhibitors and 423 substrates gives the fact that inhibitors are significantly more hydrophobic than substrates while substrates tend to have more H-bond donors than inhibitors. Then, the classification models based on simple molecular properties, topological descriptors, and molecular fingerprints were developed using the naive Bayesian classification technique. The best naive Bayesian classifier yields a Matthews correlation coefficient of 0.824 and a prediction accuracy of 91.2% for the training set from a 5-fold cross-validation procedure, and a Matthews correlation coefficient of 0.667 and a prediction accuracy of 83.5% for the test set containing 200 molecules. Analysis of the important structural fragments given by the Bayesian classifier shows that the essential H-bond acceptors arranged in distinct spatial patterns and flexibility are quite essential for P-gp substrate-likeness, which affords a deeper understanding on the molecular basis of substrate/P-gp interaction. Finally, the reasons for mispredictions were discussed. It turns out that the presented classifier could be used as a reliable virtual screening tool for identifying potential substrates of P-gp.
P-糖蛋白(P-gp)能主动将多种化学结构各异的化合物转运出细胞。它与药物及候选药物的吸收、分布、代谢、排泄和毒性(ADMET)特性高度相关,此外,在多药耐药(MDR)现象中起主要作用,而多药耐药现象会导致癌症治疗中化疗失败。因此,在药物研发过程的早期识别潜在的P-gp底物非常重要。在此,我们汇编了一个广泛的数据集,其中包含423个P-gp底物和399个非底物,这是迄今已发表的最大的P-gp底物/非底物数据集。对底物和非底物的八种重要物理化学性质的分布进行比较后发现,分子量和分子溶解度是区分P-gp底物和非底物的信息性属性。对735个P-gp抑制剂和423个底物的八种物理化学性质的分布进行研究后发现,抑制剂比底物的疏水性明显更强,而底物往往比抑制剂具有更多的氢键供体。然后,使用朴素贝叶斯分类技术开发了基于简单分子性质、拓扑描述符和分子指纹的分类模型。最佳的朴素贝叶斯分类器在5折交叉验证过程中,对训练集的马修斯相关系数为0.824,预测准确率为91.2%,对包含200个分子的测试集的马修斯相关系数为0.667,预测准确率为83.5%。对贝叶斯分类器给出的重要结构片段进行分析表明,以不同空间模式排列的必需氢键受体和灵活性对于P-gp底物相似性至关重要,这为深入理解底物/P-gp相互作用的分子基础提供了依据。最后,讨论了错误预测的原因。结果表明,所提出的分类器可作为一种可靠的虚拟筛选工具,用于识别P-gp的潜在底物。