State Key Laboratory of Biotherapy, West China Hospital, West China Medical School, Sichuan University, Chengdu, Sichuan 610041, PR China.
Comput Biol Med. 2011 Nov;41(11):1006-13. doi: 10.1016/j.compbiomed.2011.08.009. Epub 2011 Sep 14.
Breast cancer resistance protein (BCRP) is one of the key multi-drug resistance proteins, which significantly influences the therapeutic effects of many drugs, particularly anti-cancer drugs. Thus, distinguishing between substrates and non-substrates of BCRP is important not only for clinical use but also for drug discovery and development. In this study, a prediction model of the substrates and non-substrates of BCRP was developed using a modified support vector machine (SVM) method, namely GA-CG-SVM. The overall prediction accuracy of the established GA-CG-SVM model is 91.3% for the training set and 85.0% for an independent validation set. For comparison, two other machine learning methods, namely, C4.5 DT and k-NN, were also adopted to build prediction models. The results show that the GA-CG-SVM model is significantly superior to C4.5 DT and k-NN models in terms of the prediction accuracy. To sum up, the prediction model of BCRP substrates and non-substrates generated by the GA-CG-SVM method is sufficiently good and could be used as a screening tool for identifying the substrates and non-substrates of BCRP.
乳腺癌耐药蛋白(BCRP)是一种主要的多药耐药蛋白,它显著影响许多药物的治疗效果,特别是抗癌药物。因此,区分 BCRP 的底物和非底物不仅对临床应用很重要,对药物发现和开发也很重要。在这项研究中,我们使用改进的支持向量机(SVM)方法,即 GA-CG-SVM,建立了 BCRP 底物和非底物的预测模型。所建立的 GA-CG-SVM 模型对训练集的总体预测准确率为 91.3%,对独立验证集的总体预测准确率为 85.0%。为了进行比较,我们还采用了两种其他的机器学习方法,即 C4.5 DT 和 k-NN,来建立预测模型。结果表明,与 C4.5 DT 和 k-NN 模型相比,GA-CG-SVM 模型在预测准确性方面具有显著优势。总之,GA-CG-SVM 方法生成的 BCRP 底物和非底物预测模型足够好,可以用作识别 BCRP 底物和非底物的筛选工具。