Department of Bioengineering and Therapeutic Sciences, University of California, San Francisco, USA.
J Chem Inf Model. 2011 Dec 27;51(12):3078-92. doi: 10.1021/ci200377u. Epub 2011 Nov 21.
Applications in structural biology and medicinal chemistry require protein-ligand scoring functions for two distinct tasks: (i) ranking different poses of a small molecule in a protein binding site and (ii) ranking different small molecules by their complementarity to a protein site. Using probability theory, we developed two atomic distance-dependent statistical scoring functions: PoseScore was optimized for recognizing native binding geometries of ligands from other poses and RankScore was optimized for distinguishing ligands from nonbinding molecules. Both scores are based on a set of 8,885 crystallographic structures of protein-ligand complexes but differ in the values of three key parameters. Factors influencing the accuracy of scoring were investigated, including the maximal atomic distance and non-native ligand geometries used for scoring, as well as the use of protein models instead of crystallographic structures for training and testing the scoring function. For the test set of 19 targets, RankScore improved the ligand enrichment (logAUC) and early enrichment (EF(1)) scores computed by DOCK 3.6 for 13 and 14 targets, respectively. In addition, RankScore performed better at rescoring than each of seven other scoring functions tested. Accepting both the crystal structure and decoy geometries with all-atom root-mean-square errors of up to 2 Å from the crystal structure as correct binding poses, PoseScore gave the best score to a correct binding pose among 100 decoys for 88% of all cases in a benchmark set containing 100 protein-ligand complexes. PoseScore accuracy is comparable to that of DrugScore(CSD) and ITScore/SE and superior to 12 other tested scoring functions. Therefore, RankScore can facilitate ligand discovery, by ranking complexes of the target with different small molecules; PoseScore can be used for protein-ligand complex structure prediction, by ranking different conformations of a given protein-ligand pair. The statistical potentials are available through the Integrative Modeling Platform (IMP) software package (http://salilab.org/imp) and the LigScore Web server (http://salilab.org/ligscore/).
(i)对小分子在蛋白质结合部位的不同构象进行排序;(ii)根据小分子与蛋白质部位的互补性对小分子进行排序。我们使用概率论开发了两种原子距离相关的统计评分函数:PoseScore 旨在识别来自其他构象的配体的天然结合构象,而 RankScore 旨在区分配体和非结合分子。这两个分数都是基于 8885 个蛋白质-配体复合物的晶体结构,但在三个关键参数的值上有所不同。还研究了影响评分准确性的因素,包括用于评分的最大原子距离和非天然配体构象,以及使用蛋白质模型代替晶体结构进行评分函数的训练和测试。对于 19 个靶标测试集,RankScore 提高了 DOCK 3.6 计算的 13 个和 14 个靶标配体的富集(logAUC)和早期富集(EF(1))分数。此外,RankScore 在重新评分方面的表现优于测试的其他七种评分函数中的每一种。在接受晶体结构和带有所有原子 RMSD 误差高达 2Å 的诱饵构象的情况下,PoseScore 在包含 100 个蛋白质-配体复合物的基准集中,在 100 个诱饵构象中,88%的情况下,对正确结合构象的评分优于其他 12 种测试的评分函数。PoseScore 的准确性可与 DrugScore(CSD)和 ITScore/SE 相媲美,优于其他 12 种测试的评分函数。因此,RankScore 可以通过对目标与不同小分子的复合物进行排序来促进配体发现;PoseScore 可以通过对给定蛋白质-配体对的不同构象进行排序来用于蛋白质-配体复合物结构预测。统计势可通过 Integrative Modeling Platform (IMP) 软件包(http://salilab.org/imp)和 LigScore Web 服务器(http://salilab.org/ligscore/)获得。