在多样化测试集上对评分函数的比较评估。

Comparative assessment of scoring functions on a diverse test set.

作者信息

Cheng Tiejun, Li Xun, Li Yan, Liu Zhihai, Wang Renxiao

机构信息

State Key Laboratory of Bioorganic Chemistry, Shanghai Institute of Organic Chemistry, Chinese Academy of Sciences, Shanghai, P. R. China.

出版信息

J Chem Inf Model. 2009 Apr;49(4):1079-93. doi: 10.1021/ci9000053.

DOI:10.1021/ci9000053

PMID:19358517

Abstract

Scoring functions are widely applied to the evaluation of protein-ligand binding in structure-based drug design. We have conducted a comparative assessment of 16 popular scoring functions implemented in main-stream commercial software or released by academic research groups. A set of 195 diverse protein-ligand complexes with high-resolution crystal structures and reliable binding constants were selected through a systematic nonredundant sampling of the PDBbind database and used as the primary test set in our study. All scoring functions were evaluated in three aspects, that is, "docking power", "ranking power", and "scoring power", and all evaluations were independent from the context of molecular docking or virtual screening. As for "docking power", six scoring functions, including GOLD::ASP, DS::PLP1, DrugScore(PDB), GlideScore-SP, DS::LigScore, and GOLD::ChemScore, achieved success rates over 70% when the acceptance cutoff was root-mean-square deviation < 2.0 A. Combining these scoring functions into consensus scoring schemes improved the success rates to 80% or even higher. As for "ranking power" and "scoring power", the top four scoring functions on the primary test set were X-Score, DrugScore(CSD), DS::PLP, and SYBYL::ChemScore. They were able to correctly rank the protein-ligand complexes containing the same type of protein with success rates around 50%. Correlation coefficients between the experimental binding constants and the binding scores computed by these scoring functions ranged from 0.545 to 0.644. Besides the primary test set, each scoring function was also tested on four additional test sets, each consisting of a certain number of protein-ligand complexes containing one particular type of protein. Our study serves as an updated benchmark for evaluating the general performance of today's scoring functions. Our results indicate that no single scoring function consistently outperforms others in all three aspects. Thus, it is important in practice to choose the appropriate scoring functions for different purposes.

摘要

评分函数在基于结构的药物设计中被广泛应用于蛋白质-配体结合的评估。我们对主流商业软件中实现的或学术研究小组发布的16种常用评分函数进行了比较评估。通过对PDBbind数据库进行系统的非冗余采样，选择了一组195个具有高分辨率晶体结构和可靠结合常数的不同蛋白质-配体复合物，并将其用作我们研究中的主要测试集。所有评分函数都从三个方面进行了评估，即“对接能力”、“排序能力”和“评分能力”，所有评估均独立于分子对接或虚拟筛选的背景。对于“对接能力”，当接受阈值为均方根偏差<2.0 Å时，包括GOLD::ASP、DS::PLP1、DrugScore(PDB)、GlideScore-SP、DS::LigScore和GOLD::ChemScore在内的六种评分函数的成功率超过70%。将这些评分函数组合成共识评分方案可将成功率提高到80%甚至更高。对于“排序能力”和“评分能力”，主要测试集上排名前四的评分函数是X-Score、DrugScore(CSD)、DS::PLP和SYBYL::ChemScore。它们能够以约50%的成功率正确地对包含相同类型蛋白质的蛋白质-配体复合物进行排序。这些评分函数计算的实验结合常数与结合分数之间的相关系数在0.545至0.644之间。除了主要测试集之外，每个评分函数还在另外四个测试集上进行了测试，每个测试集由一定数量的包含一种特定类型蛋白质的蛋白质-配体复合物组成。我们的研究为评估当今评分函数的一般性能提供了一个更新的基准。我们的结果表明，没有一个单一的评分函数在所有三个方面都始终优于其他函数。因此，在实践中为不同目的选择合适的评分函数很重要。