为开发蛋白质-配体相互作用评分函数奠定基础。

State Key Laboratory of Bioorganic and Natural Products Chemistry, Collaborative Innovation Center of Chemistry for Life Sciences, Shanghai Institute of Organic Chemistry, Chinese Academy of Sciences , 345 Lingling Road, Shanghai 200032, People's Republic of China.

State Key Laboratory of Quality Research in Chinese Medicine, Macau Institute for Applied Research in Medicine and Health, Macau University of Science and Technology , Macau, People's Republic of China.

Acc Chem Res. 2017 Feb 21;50(2):302-309. doi: 10.1021/acs.accounts.6b00491. Epub 2017 Feb 9.

In structure-based drug design, scoring functions are widely used for fast evaluation of protein-ligand interactions. They are often applied in combination with molecular docking and de novo design methods. Since the early 1990s, a whole spectrum of protein-ligand interaction scoring functions have been developed. Regardless of their technical difference, scoring functions all need data sets combining protein-ligand complex structures and binding affinity data for parametrization and validation. However, data sets of this kind used to be rather limited in terms of size and quality. On the other hand, standard metrics for evaluating scoring function used to be ambiguous. Scoring functions are often tested in molecular docking or even virtual screening trials, which do not directly reflect the genuine quality of scoring functions. Collectively, these underlying obstacles have impeded the invention of more advanced scoring functions. In this Account, we describe our long-lasting efforts to overcome these obstacles, which involve two related projects. On the first project, we have created the PDBbind database. It is the first database that systematically annotates the protein-ligand complexes in the Protein Data Bank (PDB) with experimental binding data. This database has been updated annually since its first public release in 2004. The latest release (version 2016) provides binding data for 16 179 biomolecular complexes in PDB. Data sets provided by PDBbind have been applied to many computational and statistical studies on protein-ligand interaction and various subjects. In particular, it has become a major data resource for scoring function development. On the second project, we have established the Comparative Assessment of Scoring Functions (CASF) benchmark for scoring function evaluation. Our key idea is to decouple the "scoring" process from the "sampling" process, so scoring functions can be tested in a relatively pure context to reflect their quality. In our latest work on this track, i.e. CASF-2013, the performance of a scoring function was quantified in four aspects, including "scoring power", "ranking power", "docking power", and "screening power". All four performance tests were conducted on a test set containing 195 high-quality protein-ligand complexes selected from PDBbind. A panel of 20 standard scoring functions were tested as demonstration. Importantly, CASF is designed to be an open-access benchmark, with which scoring functions developed by different researchers can be compared on the same grounds. Indeed, it has become a popular choice for scoring function validation in recent years. Despite the considerable progress that has been made so far, the performance of today's scoring functions still does not meet people's expectations in many aspects. There is a constant demand for more advanced scoring functions. Our efforts have helped to overcome some obstacles underlying scoring function development so that the researchers in this field can move forward faster. We will continue to improve the PDBbind database and the CASF benchmark in the future to keep them as useful community resources.

在基于结构的药物设计中，评分函数被广泛用于快速评估蛋白质-配体相互作用。它们通常与分子对接和从头设计方法结合使用。自 20 世纪 90 年代初以来，已经开发出了一系列的蛋白质-配体相互作用评分函数。无论它们的技术差异如何，评分函数都需要结合蛋白质-配体复合物结构和结合亲和力数据的数据集进行参数化和验证。然而，此类数据集在规模和质量方面曾经相当有限。另一方面，用于评估评分函数的标准指标曾经不明确。评分函数通常在分子对接甚至虚拟筛选试验中进行测试，而这些试验并不能直接反映评分函数的真正质量。这些潜在的障碍共同阻碍了更先进的评分函数的发明。在本报告中，我们描述了我们克服这些障碍的长期努力，其中涉及两个相关项目。在第一个项目中，我们创建了 PDBbind 数据库。它是第一个系统地用实验结合数据注释蛋白质数据库（PDB）中蛋白质-配体复合物的数据库。自 2004 年首次公开发布以来，该数据库每年都在更新。最新版本（2016 版）提供了 PDB 中 16179 个生物分子复合物的结合数据。PDBbind 提供的数据集已应用于许多蛋白质-配体相互作用和各种主题的计算和统计研究。特别是，它已成为评分函数开发的主要数据资源。在第二个项目中，我们建立了评分函数评估的比较评估评分函数（CASF）基准。我们的主要想法是将“评分”过程与“采样”过程解耦，以便可以在相对纯净的环境中测试评分函数，以反映其质量。在我们关于该主题的最新工作中，即 CASF-2013，对评分函数的性能从四个方面进行了量化，包括“评分能力”、“排序能力”、“对接能力”和“筛选能力”。所有四项性能测试均在包含从 PDBbind 中选择的 195 个高质量蛋白质-配体复合物的测试集中进行。用 20 个标准评分函数进行了测试作为演示。重要的是，CASF 被设计为一个开放访问的基准，不同研究人员开发的评分函数可以在相同的基础上进行比较。事实上，它已成为近年来评分函数验证的热门选择。尽管迄今为止已经取得了相当大的进展，但在许多方面，当今评分函数的性能仍未达到人们的预期。对更先进的评分函数的需求一直存在。我们的努力帮助克服了评分函数开发背后的一些障碍，使该领域的研究人员能够更快地前进。我们将继续改进 PDBbind 数据库和 CASF 基准，使其成为有用的社区资源。

相似文献

Forging the Basis for Developing Protein-Ligand Interaction Scoring Functions.

Acc Chem Res. 2017 Feb 21;50(2):302-309. doi: 10.1021/acs.accounts.6b00491. Epub 2017 Feb 9.

Comparative assessment of scoring functions on an updated benchmark: 2. Evaluation methods and general results.

J Chem Inf Model. 2014 Jun 23;54(6):1717-36. doi: 10.1021/ci500081m. Epub 2014 Jun 2.

Comparative Assessment of Scoring Functions: The CASF-2016 Update.

J Chem Inf Model. 2019 Feb 25;59(2):895-913. doi: 10.1021/acs.jcim.8b00545. Epub 2018 Dec 11.

Comparative assessment of scoring functions on an updated benchmark: 1. Compilation of the test set.

J Chem Inf Model. 2014 Jun 23;54(6):1700-16. doi: 10.1021/ci500080q. Epub 2014 Jun 2.

Comparative assessment of scoring functions on a diverse test set.

J Chem Inf Model. 2009 Apr;49(4):1079-93. doi: 10.1021/ci9000053.

Assessing protein-ligand interaction scoring functions with the CASF-2013 benchmark.

Nat Protoc. 2018 Apr;13(4):666-680. doi: 10.1038/nprot.2017.114. Epub 2018 Mar 8.

Development of a new benchmark for assessing the scoring functions applicable to protein-protein interactions.

Future Med Chem. 2018 Jul 1;10(13):1555-1574. doi: 10.4155/fmc-2017-0261. Epub 2018 Jun 28.

Machine learning in computational docking.

Artif Intell Med. 2015 Mar;63(3):135-52. doi: 10.1016/j.artmed.2015.02.002. Epub 2015 Feb 16.

Comprehensive evaluation of ten docking programs on a diverse set of protein-ligand complexes: the prediction accuracy of sampling power and scoring power.

Phys Chem Chem Phys. 2016 May 14;18(18):12964-75. doi: 10.1039/c6cp01555g. Epub 2016 Apr 25.

Evaluation of AutoDock and AutoDock Vina on the CASF-2013 Benchmark.

J Chem Inf Model. 2018 Aug 27;58(8):1697-1706. doi: 10.1021/acs.jcim.8b00312. Epub 2018 Jul 25.

引用本文的文献

Relevance of 3D Rotationally Equivariant Neural Networks for Predicting Protein-Ligand Binding Affinities.

Interdiscip Sci. 2025 Aug 14. doi: 10.1007/s12539-025-00745-z.

Predicting receptor-ligand pairing preferences in plant-microbe interfaces via molecular dynamics and machine learning.

Comput Struct Biotechnol J. 2025 Jun 18;27:2782-2795. doi: 10.1016/j.csbj.2025.06.029. eCollection 2025.

Predicting Affinity Through Homology (PATH): Interpretable binding affinity prediction with persistent homology.

PLoS Comput Biol. 2025 Jun 27;21(6):e1013216. doi: 10.1371/journal.pcbi.1013216. eCollection 2025 Jun.

Comment on "Contrastive pre-training and 3D convolution neural network for RNA and small molecule binding affinity prediction" by Sun and Gao.

Bioinformatics. 2025 Jun 2;41(6). doi: 10.1093/bioinformatics/btaf163.

CACHE Challenge #2: Targeting the RNA Site of the SARS-CoV-2 Helicase Nsp13.

J Chem Inf Model. 2025 Jul 14;65(13):6884-6898. doi: 10.1021/acs.jcim.5c00535. Epub 2025 Jun 20.

Advancing active compound discovery for novel drug targets: insights from AI-driven approaches.

Acta Pharmacol Sin. 2025 Jun 17. doi: 10.1038/s41401-025-01591-x.

Factors Influencing the Binding of HIV-1 Protease Inhibitors: Insights from Machine Learning Models.

ChemMedChem. 2025 Aug 2;20(15):e202500277. doi: 10.1002/cmdc.202500277. Epub 2025 Jun 21.

The Quasi-Bound State as a Predictor of Relative Binding Free Energy.

J Chem Inf Model. 2025 Jun 9;65(11):5544-5552. doi: 10.1021/acs.jcim.5c00289. Epub 2025 May 20.

Assessing interaction recovery of predicted protein-ligand poses.

J Cheminform. 2025 May 19;17(1):76. doi: 10.1186/s13321-025-01011-6.

EM-PLA: environment-aware heterogeneous graph-based multimodal protein-ligand binding affinity prediction.

Bioinformatics. 2025 Jul 1;41(7). doi: 10.1093/bioinformatics/btaf298.

Suppr 超能文献

核心技术专利：CN118964589B侵权必究

相似文献

Forging the Basis for Developing Protein-Ligand Interaction Scoring Functions.

Acc Chem Res. 2017 Feb 21;50(2):302-309. doi: 10.1021/acs.accounts.6b00491. Epub 2017 Feb 9.

Comparative assessment of scoring functions on an updated benchmark: 2. Evaluation methods and general results.

J Chem Inf Model. 2014 Jun 23;54(6):1717-36. doi: 10.1021/ci500081m. Epub 2014 Jun 2.

Comparative Assessment of Scoring Functions: The CASF-2016 Update.

J Chem Inf Model. 2019 Feb 25;59(2):895-913. doi: 10.1021/acs.jcim.8b00545. Epub 2018 Dec 11.

Comparative assessment of scoring functions on an updated benchmark: 1. Compilation of the test set.

J Chem Inf Model. 2014 Jun 23;54(6):1700-16. doi: 10.1021/ci500080q. Epub 2014 Jun 2.

Comparative assessment of scoring functions on a diverse test set.

J Chem Inf Model. 2009 Apr;49(4):1079-93. doi: 10.1021/ci9000053.

Assessing protein-ligand interaction scoring functions with the CASF-2013 benchmark.

Nat Protoc. 2018 Apr;13(4):666-680. doi: 10.1038/nprot.2017.114. Epub 2018 Mar 8.

Development of a new benchmark for assessing the scoring functions applicable to protein-protein interactions.

Future Med Chem. 2018 Jul 1;10(13):1555-1574. doi: 10.4155/fmc-2017-0261. Epub 2018 Jun 28.

Machine learning in computational docking.

Artif Intell Med. 2015 Mar;63(3):135-52. doi: 10.1016/j.artmed.2015.02.002. Epub 2015 Feb 16.

Comprehensive evaluation of ten docking programs on a diverse set of protein-ligand complexes: the prediction accuracy of sampling power and scoring power.

Phys Chem Chem Phys. 2016 May 14;18(18):12964-75. doi: 10.1039/c6cp01555g. Epub 2016 Apr 25.

Evaluation of AutoDock and AutoDock Vina on the CASF-2013 Benchmark.

J Chem Inf Model. 2018 Aug 27;58(8):1697-1706. doi: 10.1021/acs.jcim.8b00312. Epub 2018 Jul 25.

引用本文的文献

Relevance of 3D Rotationally Equivariant Neural Networks for Predicting Protein-Ligand Binding Affinities.

Interdiscip Sci. 2025 Aug 14. doi: 10.1007/s12539-025-00745-z.

Predicting receptor-ligand pairing preferences in plant-microbe interfaces via molecular dynamics and machine learning.

Comput Struct Biotechnol J. 2025 Jun 18;27:2782-2795. doi: 10.1016/j.csbj.2025.06.029. eCollection 2025.

Predicting Affinity Through Homology (PATH): Interpretable binding affinity prediction with persistent homology.

PLoS Comput Biol. 2025 Jun 27;21(6):e1013216. doi: 10.1371/journal.pcbi.1013216. eCollection 2025 Jun.

Comment on "Contrastive pre-training and 3D convolution neural network for RNA and small molecule binding affinity prediction" by Sun and Gao.

Bioinformatics. 2025 Jun 2;41(6). doi: 10.1093/bioinformatics/btaf163.

CACHE Challenge #2: Targeting the RNA Site of the SARS-CoV-2 Helicase Nsp13.

J Chem Inf Model. 2025 Jul 14;65(13):6884-6898. doi: 10.1021/acs.jcim.5c00535. Epub 2025 Jun 20.

Advancing active compound discovery for novel drug targets: insights from AI-driven approaches.

Acta Pharmacol Sin. 2025 Jun 17. doi: 10.1038/s41401-025-01591-x.

Factors Influencing the Binding of HIV-1 Protease Inhibitors: Insights from Machine Learning Models.

ChemMedChem. 2025 Aug 2;20(15):e202500277. doi: 10.1002/cmdc.202500277. Epub 2025 Jun 21.

The Quasi-Bound State as a Predictor of Relative Binding Free Energy.

J Chem Inf Model. 2025 Jun 9;65(11):5544-5552. doi: 10.1021/acs.jcim.5c00289. Epub 2025 May 20.

Assessing interaction recovery of predicted protein-ligand poses.

J Cheminform. 2025 May 19;17(1):76. doi: 10.1186/s13321-025-01011-6.

EM-PLA: environment-aware heterogeneous graph-based multimodal protein-ligand binding affinity prediction.

Bioinformatics. 2025 Jul 1;41(7). doi: 10.1093/bioinformatics/btaf298.

Forging the Basis for Developing Protein-Ligand Interaction Scoring Functions.

机构信息

出版信息

相似文献

引用本文的文献

文献AI研究员

用中文搜PubMed

文档翻译

Suppr 超能文献

相似文献

引用本文的文献