Department of Chemistry and Biochemistry and Center for Nanoscience, University of Missouri-St. Louis, Saint Louis, Missouri, USA.
Proteins. 2020 Oct;88(10):1263-1270. doi: 10.1002/prot.25899. Epub 2020 May 25.
Ensemble docking has provided an inexpensive method to account for receptor flexibility in molecular docking for virtual screening. Unfortunately, as there is no rigorous theory to connect the docking scores from multiple structures to measured activity, researchers have not yet come up with effective ways to use these scores to classify compounds into actives and inactives. This shortcoming has led to the decrease, rather than an increase in the performance of classifying compounds when more structures are added to the ensemble. Previously, we suggested machine learning, implemented in the form of a naïve Bayesian model could alleviate this problem. However, the naïve Bayesian model assumed that the probabilities of observing the docking scores to different structures to be independent. This approximation might prevent it from achieving even higher performance. In the work presented in this paper, we have relaxed this approximation when using several other machine learning methods-k nearest neighbor, logistic regression, support vector machine, and random forest-to improve ensemble docking. We found significant improvement.
基于分子对接的虚拟筛选中,对接配体时考虑受体柔性,集合 docking 提供了一种廉价的方法。遗憾的是,由于缺乏将来自多个结构的 docking 评分与测量活性相关联的严格理论,研究人员还没有找到有效方法来利用这些评分将化合物分类为活性和非活性。当向集合中添加更多结构时,这一缺陷导致化合物分类的性能下降,而不是提高。此前,我们提出了机器学习(以朴素贝叶斯模型的形式实现)可以缓解这一问题。然而,朴素贝叶斯模型假设观察不同结构的 docking 评分的概率是相互独立的。这种近似可能会阻止它实现更高的性能。在本文提出的工作中,我们放宽了使用其他几种机器学习方法(k 最近邻、逻辑回归、支持向量机和随机森林)来改进集合对接时的这一近似。我们发现了显著的改进。