Global DMPK, Takeda Pharmaceutical Company Limited, 26-1 Muraoka-Higashi, 2-Chome, Fujisawa, Kanagawa, 251-8555, Japan.
Graduate School of Science and Technology, Nara Institute of Science and Technology, 8916-5 Takayamacho, Ikoma, Nara, 630-0101, Japan.
AAPS J. 2023 Sep 12;25(5):88. doi: 10.1208/s12248-023-00853-y.
Multidrug resistance (MDR1) and breast cancer resistance protein (BCRP) play important roles in drug absorption and distribution. Computational prediction of substrates for both transporters can help reduce time in drug discovery. This study aimed to predict the efflux activity of MDR1 and BCRP using multiple machine learning approaches with molecular descriptors and graph convolutional networks (GCNs). In vitro efflux activity was determined using MDR1- and BCRP-expressing cells. Predictive performance was assessed using an in-house dataset with a chronological split and an external dataset. CatBoost and support vector regression showed the best predictive performance for MDR1 and BCRP efflux activities, respectively, of the 25 descriptor-based machine learning methods based on the coefficient of determination (R). The single-task GCN showed a slightly lower performance than descriptor-based prediction in the in-house dataset. In both approaches, the percentage of compounds predicted within twofold of the observed values in the external dataset was lower than that in the in-house dataset. Multi-task GCN did not show any improvements, whereas multimodal GCN increased the predictive performance of BCRP efflux activity compared with single-task GCN. Furthermore, the ensemble approach of descriptor-based machine learning and GCN achieved the highest predictive performance with R values of 0.706 and 0.587 in MDR1 and BCRP, respectively, in time-split test sets. This result suggests that two different approaches to represent molecular structures complement each other in terms of molecular characteristics. Our study demonstrated that predictive models using advanced machine learning approaches are beneficial for identifying potential substrate liability of both MDR1 and BCRP.
多药耐药(MDR1)和乳腺癌耐药蛋白(BCRP)在药物吸收和分布中发挥重要作用。两种转运蛋白的底物的计算预测有助于减少药物发现时间。本研究旨在使用分子描述符和图卷积网络(GCN)的多种机器学习方法预测 MDR1 和 BCRP 的外排活性。使用表达 MDR1 和 BCRP 的细胞测定体外外排活性。使用具有时间分割的内部数据集和外部数据集评估预测性能。CatBoost 和支持向量回归分别显示了 25 种基于描述符的机器学习方法中对 MDR1 和 BCRP 外排活性的最佳预测性能,基于决定系数(R)。在内部数据集中,单任务 GCN 的性能略低于基于描述符的预测。在这两种方法中,在外部数据集中预测值与观察值相差两倍以内的化合物百分比均低于内部数据集。多任务 GCN 没有显示出任何改进,而多模态 GCN 与单任务 GCN 相比,提高了 BCRP 外排活性的预测性能。此外,基于描述符的机器学习和 GCN 的集成方法在时间分割测试集中分别达到了 0.706 和 0.587 的 R 值,实现了最高的预测性能。该结果表明,两种不同的分子结构表示方法在分子特征方面相互补充。我们的研究表明,使用先进的机器学习方法的预测模型有助于识别 MDR1 和 BCRP 的潜在底物易感性。