Arimoto Rieko, Prasad Madhu-Ashni, Gifford Eric M
Pfizer Global Research and Development, Ann Arbor, Michigan, USA.
J Biomol Screen. 2005 Apr;10(3):197-205. doi: 10.1177/1087057104274091.
Computational models of cytochrome P450 3A4 inhibition were developed based on high-throughput screening data for 4470 proprietary compounds. Multiple models differentiating inhibitors (IC(50) <3 microM) and noninhibitors were generated using various machine-learning algorithms (recursive partitioning [RP], Bayesian classifier, logistic regression, k-nearest-neighbor, and support vector machine [SVM]) with structural fingerprints and topological indices. Nineteen models were evaluated by internal 10-fold cross-validation and also by an independent test set. Three most predictive models, Barnard Chemical Information (BCI)-fingerprint/SVM, MDL-keyset/SVM, and topological indices/RP, correctly classified 249, 248, and 236 compounds of 291 noninhibitors and 135, 137, and 147 compounds of 179 inhibitors in the validation set. Their overall accuracies were 82%, 82%, and 81%, respectively. Investigating applicability of the BCI/SVM model found a strong correlation between the predictive performance and the structural similarity to the training set. Using Tanimoto similarity index as a confidence measurement for the predictions, the limitation of the extrapolation was 0.7 in the case of the BCI/SVM model. Taking consensus of the 3 best models yielded a further improvement in predictive capability, kappa = 0.65 and accuracy = 83%. The consensus model could also be tuned to minimize either false positives or false negatives depending on the emphasis of the screening.
基于4470种专利化合物的高通量筛选数据,开发了细胞色素P450 3A4抑制的计算模型。使用各种机器学习算法(递归划分[RP]、贝叶斯分类器、逻辑回归、k近邻和支持向量机[SVM])以及结构指纹和拓扑指数,生成了区分抑制剂(IC(50)<3 microM)和非抑制剂的多个模型。通过内部10倍交叉验证和独立测试集对19个模型进行了评估。三个预测性最强的模型,即Barnard化学信息(BCI)-指纹/SVM、MDL-键集/SVM和拓扑指数/RP,在验证集中正确分类了291个非抑制剂中的249、248和236个化合物以及179个抑制剂中的135、137和147个化合物。它们的总体准确率分别为82%、82%和81%。对BCI/SVM模型适用性的研究发现,预测性能与与训练集的结构相似性之间存在很强的相关性。使用Tanimoto相似性指数作为预测的置信度度量,在BCI/SVM模型的情况下,外推的局限性为0.7。对3个最佳模型进行共识得到了预测能力的进一步提高,kappa = 0.65,准确率 = 83%。根据筛选的重点,共识模型也可以进行调整,以尽量减少假阳性或假阴性。