Institute of Molecular and Cellular Biology, University of Leeds, Leeds, United Kingdom.
J Chem Inf Model. 2011 Feb 28;51(2):408-19. doi: 10.1021/ci100369f. Epub 2011 Feb 3.
Docking scoring functions are notoriously weak predictors of binding affinity. They typically assign a common set of weights to the individual energy terms that contribute to the overall energy score; however, these weights should be gene family dependent. In addition, they incorrectly assume that individual interactions contribute toward the total binding affinity in an additive manner. In reality, noncovalent interactions often depend on one another in a nonlinear manner. In this paper, we show how the use of support vector machines (SVMs), trained by associating sets of individual energy terms retrieved from molecular docking with the known binding affinity of each compound from high-throughput screening experiments, can be used to improve the correlation between known binding affinities and those predicted by the docking program eHiTS. We construct two prediction models: a regression model trained using IC(50) values from BindingDB, and a classification model trained using active and decoy compounds from the Directory of Useful Decoys (DUD). Moreover, to address the issue of overrepresentation of negative data in high-throughput screening data sets, we have designed a multiple-planar SVM training procedure for the classification model. The increased performance that both SVMs give when compared with the original eHiTS scoring function highlights the potential for using nonlinear methods when deriving overall energy scores from their individual components. We apply the above methodology to train a new scoring function for direct inhibitors of Mycobacterium tuberculosis (M.tb) InhA. By combining ligand binding site comparison with the new scoring function, we propose that phosphodiesterase inhibitors can potentially be repurposed to target M.tb InhA. Our methodology may be applied to other gene families for which target structures and activity data are available, as demonstrated in the work presented here.
对接评分函数是众所周知的弱结合亲和力预测因子。它们通常为对整体能量评分有贡献的各个能量项分配一组共同的权重;然而,这些权重应该依赖于基因家族。此外,它们错误地假设各个相互作用以加和的方式对总结合亲和力有贡献。实际上,非共价相互作用通常以非线性的方式相互依赖。在本文中,我们展示了如何使用支持向量机(SVM),通过将从分子对接中检索到的一组个体能量项与高通量筛选实验中每个化合物的已知结合亲和力相关联来训练 SVM,从而提高已知结合亲和力与对接程序 eHiTS 预测的亲和力之间的相关性。我们构建了两个预测模型:一个使用 BindingDB 的 IC(50) 值训练的回归模型,以及一个使用来自目录的活性和诱饵化合物训练的分类模型有用的诱饵 (DUD)。此外,为了解决高通量筛选数据集中性数据过度表示的问题,我们为分类模型设计了一种多平面 SVM 训练程序。与原始 eHiTS 评分函数相比,两种 SVM 的性能提高都强调了当从个体成分得出整体能量评分时使用非线性方法的潜力。我们将上述方法应用于训练一种新的分枝杆菌结核 (M.tb) InhA 的直接抑制剂的评分函数。通过将配体结合位点比较与新的评分函数相结合,我们提出磷酸二酯酶抑制剂可能被重新用于靶向 M.tb InhA。我们的方法可应用于其他具有靶结构和活性数据的基因家族,如本文所述。