State Key Laboratory of Bioorganic and Natural Products Chemistry, Collaborative Innovation Center of Chemistry for Life Sciences, Shanghai Institute of Organic Chemistry, Chinese Academy of Sciences , 345 Lingling Road, Shanghai 200032, People's Republic of China.
State Key Laboratory of Quality Research in Chinese Medicine, Macau Institute for Applied Research in Medicine and Health, Macau University of Science and Technology , Macau, People's Republic of China.
Acc Chem Res. 2017 Feb 21;50(2):302-309. doi: 10.1021/acs.accounts.6b00491. Epub 2017 Feb 9.
In structure-based drug design, scoring functions are widely used for fast evaluation of protein-ligand interactions. They are often applied in combination with molecular docking and de novo design methods. Since the early 1990s, a whole spectrum of protein-ligand interaction scoring functions have been developed. Regardless of their technical difference, scoring functions all need data sets combining protein-ligand complex structures and binding affinity data for parametrization and validation. However, data sets of this kind used to be rather limited in terms of size and quality. On the other hand, standard metrics for evaluating scoring function used to be ambiguous. Scoring functions are often tested in molecular docking or even virtual screening trials, which do not directly reflect the genuine quality of scoring functions. Collectively, these underlying obstacles have impeded the invention of more advanced scoring functions. In this Account, we describe our long-lasting efforts to overcome these obstacles, which involve two related projects. On the first project, we have created the PDBbind database. It is the first database that systematically annotates the protein-ligand complexes in the Protein Data Bank (PDB) with experimental binding data. This database has been updated annually since its first public release in 2004. The latest release (version 2016) provides binding data for 16 179 biomolecular complexes in PDB. Data sets provided by PDBbind have been applied to many computational and statistical studies on protein-ligand interaction and various subjects. In particular, it has become a major data resource for scoring function development. On the second project, we have established the Comparative Assessment of Scoring Functions (CASF) benchmark for scoring function evaluation. Our key idea is to decouple the "scoring" process from the "sampling" process, so scoring functions can be tested in a relatively pure context to reflect their quality. In our latest work on this track, i.e. CASF-2013, the performance of a scoring function was quantified in four aspects, including "scoring power", "ranking power", "docking power", and "screening power". All four performance tests were conducted on a test set containing 195 high-quality protein-ligand complexes selected from PDBbind. A panel of 20 standard scoring functions were tested as demonstration. Importantly, CASF is designed to be an open-access benchmark, with which scoring functions developed by different researchers can be compared on the same grounds. Indeed, it has become a popular choice for scoring function validation in recent years. Despite the considerable progress that has been made so far, the performance of today's scoring functions still does not meet people's expectations in many aspects. There is a constant demand for more advanced scoring functions. Our efforts have helped to overcome some obstacles underlying scoring function development so that the researchers in this field can move forward faster. We will continue to improve the PDBbind database and the CASF benchmark in the future to keep them as useful community resources.
在基于结构的药物设计中,评分函数被广泛用于快速评估蛋白质-配体相互作用。它们通常与分子对接和从头设计方法结合使用。自 20 世纪 90 年代初以来,已经开发出了一系列的蛋白质-配体相互作用评分函数。无论它们的技术差异如何,评分函数都需要结合蛋白质-配体复合物结构和结合亲和力数据的数据集进行参数化和验证。然而,此类数据集在规模和质量方面曾经相当有限。另一方面,用于评估评分函数的标准指标曾经不明确。评分函数通常在分子对接甚至虚拟筛选试验中进行测试,而这些试验并不能直接反映评分函数的真正质量。这些潜在的障碍共同阻碍了更先进的评分函数的发明。在本报告中,我们描述了我们克服这些障碍的长期努力,其中涉及两个相关项目。在第一个项目中,我们创建了 PDBbind 数据库。它是第一个系统地用实验结合数据注释蛋白质数据库(PDB)中蛋白质-配体复合物的数据库。自 2004 年首次公开发布以来,该数据库每年都在更新。最新版本(2016 版)提供了 PDB 中 16179 个生物分子复合物的结合数据。PDBbind 提供的数据集已应用于许多蛋白质-配体相互作用和各种主题的计算和统计研究。特别是,它已成为评分函数开发的主要数据资源。在第二个项目中,我们建立了评分函数评估的比较评估评分函数(CASF)基准。我们的主要想法是将“评分”过程与“采样”过程解耦,以便可以在相对纯净的环境中测试评分函数,以反映其质量。在我们关于该主题的最新工作中,即 CASF-2013,对评分函数的性能从四个方面进行了量化,包括“评分能力”、“排序能力”、“对接能力”和“筛选能力”。所有四项性能测试均在包含从 PDBbind 中选择的 195 个高质量蛋白质-配体复合物的测试集中进行。用 20 个标准评分函数进行了测试作为演示。重要的是,CASF 被设计为一个开放访问的基准,不同研究人员开发的评分函数可以在相同的基础上进行比较。事实上,它已成为近年来评分函数验证的热门选择。尽管迄今为止已经取得了相当大的进展,但在许多方面,当今评分函数的性能仍未达到人们的预期。对更先进的评分函数的需求一直存在。我们的努力帮助克服了评分函数开发背后的一些障碍,使该领域的研究人员能够更快地前进。我们将继续改进 PDBbind 数据库和 CASF 基准,使其成为有用的社区资源。
Acc Chem Res. 2017-2-9
J Chem Inf Model. 2018-12-11
J Chem Inf Model. 2014-6-23
J Chem Inf Model. 2009-4
Artif Intell Med. 2015-2-16
J Chem Inf Model. 2018-7-25
Comput Struct Biotechnol J. 2025-6-18
J Chem Inf Model. 2025-7-14
Acta Pharmacol Sin. 2025-6-17
J Chem Inf Model. 2025-6-9
J Cheminform. 2025-5-19