Department of Chemistry and Zukunftskolleg, University of Konstanz, Konstanz, Germany.
J Comput Aided Mol Des. 2012 Feb;26(2):185-97. doi: 10.1007/s10822-011-9539-5. Epub 2012 Jan 10.
Due to the large number of different docking programs and scoring functions available, researchers are faced with the problem of selecting the most suitable one when starting a structure-based drug discovery project. To guide the decision process, several studies comparing different docking and scoring approaches have been published. In the context of comparing scoring function performance, it is common practice to use a predefined, computer-generated set of ligand poses (decoys) and to reevaluate their score using the set of scoring functions to be compared. But are predefined decoy sets able to unambiguously evaluate and rank different scoring functions with respect to pose prediction performance? This question arose when the pose prediction performance of our piecewise linear potential derived scoring functions (Korb et al. in J Chem Inf Model 49:84-96, 2009) was assessed on a standard decoy set (Cheng et al. in J Chem Inf Model 49:1079-1093, 2009). While they showed excellent pose identification performance when they were used for rescoring of the predefined decoy conformations, a pronounced degradation in performance could be observed when they were directly applied in docking calculations using the same test set. This implies that on a discrete set of ligand poses only the rescoring performance can be evaluated. For comparing the pose prediction performance in a more rigorous manner, the search space of each scoring function has to be sampled extensively as done in the docking calculations performed here. We were able to identify relative strengths and weaknesses of three scoring functions (ChemPLP, GoldScore, and Astex Statistical Potential) by analyzing the performance for subsets of the complexes grouped by different properties of the active site. However, reasons for the overall poor performance of all three functions on this test set compared to other test sets of similar size could not be identified.
由于有大量不同的对接程序和评分函数可供选择,研究人员在开始基于结构的药物发现项目时面临选择最合适的程序和评分函数的问题。为了指导决策过程,已经发表了一些比较不同对接和评分方法的研究。在比较评分函数性能的背景下,使用预定义的、计算机生成的配体构象(诱饵)集合并使用要比较的评分函数集重新评估其得分是常见的做法。但是,预定义的诱饵集是否能够明确评估和排名不同的评分函数的构象预测性能?当我们评估基于分段线性势能的评分函数(Korb 等人,J Chem Inf Model 49:84-96, 2009)的构象预测性能时,就出现了这个问题,该评分函数是在标准诱饵集(Cheng 等人,J Chem Inf Model 49:1079-1093, 2009)上进行评估的。虽然当它们被用于重新评分预定义的诱饵构象时,它们表现出出色的构象识别性能,但当它们直接应用于使用相同测试集的对接计算时,性能明显下降。这意味着,仅在离散的配体构象集合上才能评估重新评分性能。为了更严格地比较构象预测性能,必须像这里进行的对接计算那样,广泛地对每个评分函数的搜索空间进行采样。我们能够通过分析按活性位点不同性质分组的复合物子集的性能来识别三种评分函数(ChemPLP、GoldScore 和 Astex 统计势能)的相对优势和劣势。然而,与其他类似大小的测试集相比,所有三种函数在这个测试集上的整体性能不佳的原因仍无法确定。