Suppr超能文献

虚拟筛选中配体富集评估的基准测试方法和数据集

Benchmarking methods and data sets for ligand enrichment assessment in virtual screening.

作者信息

Xia Jie, Tilahun Ermias Lemma, Reid Terry-Elinor, Zhang Liangren, Wang Xiang Simon

机构信息

State Key Laboratory of Natural and Biomimetic Drugs, School of Pharmaceutical Sciences, Peking University, Beijing 100191, PR China; Molecular Modeling and Drug Discovery Core for District of Columbia Developmental Center for AIDS Research (DC D-CFAR), Laboratory of Cheminformatics and Drug Design, Department of Pharmaceutical Sciences, College of Pharmacy, Howard University, Washington, DC 20059, USA.

Molecular Modeling and Drug Discovery Core for District of Columbia Developmental Center for AIDS Research (DC D-CFAR), Laboratory of Cheminformatics and Drug Design, Department of Pharmaceutical Sciences, College of Pharmacy, Howard University, Washington, DC 20059, USA.

出版信息

Methods. 2015 Jan;71:146-57. doi: 10.1016/j.ymeth.2014.11.015. Epub 2014 Dec 3.

Abstract

Retrospective small-scale virtual screening (VS) based on benchmarking data sets has been widely used to estimate ligand enrichments of VS approaches in the prospective (i.e. real-world) efforts. However, the intrinsic differences of benchmarking sets to the real screening chemical libraries can cause biased assessment. Herein, we summarize the history of benchmarking methods as well as data sets and highlight three main types of biases found in benchmarking sets, i.e. "analogue bias", "artificial enrichment" and "false negative". In addition, we introduce our recent algorithm to build maximum-unbiased benchmarking sets applicable to both ligand-based and structure-based VS approaches, and its implementations to three important human histone deacetylases (HDACs) isoforms, i.e. HDAC1, HDAC6 and HDAC8. The leave-one-out cross-validation (LOO CV) demonstrates that the benchmarking sets built by our algorithm are maximum-unbiased as measured by property matching, ROC curves and AUCs.

摘要

基于基准数据集的回顾性小规模虚拟筛选(VS)已被广泛用于评估前瞻性(即实际应用)中VS方法的配体富集情况。然而,基准数据集与实际筛选化学文库之间的内在差异可能导致评估出现偏差。在此,我们总结了基准方法以及数据集的历史,并强调了在基准数据集中发现的三种主要偏差类型,即“类似物偏差”、“人为富集”和“假阴性”。此外,我们介绍了我们最近开发的算法,该算法可构建适用于基于配体和基于结构的VS方法的最大无偏基准数据集,并将其应用于三种重要的人类组蛋白去乙酰化酶(HDAC)亚型,即HDAC1、HDAC6和HDAC8。留一法交叉验证(LOO CV)表明,通过我们的算法构建的基准数据集在通过性质匹配、ROC曲线和AUC测量时是最大无偏的。

相似文献

1
Benchmarking methods and data sets for ligand enrichment assessment in virtual screening.
Methods. 2015 Jan;71:146-57. doi: 10.1016/j.ymeth.2014.11.015. Epub 2014 Dec 3.
2
An unbiased method to build benchmarking sets for ligand-based virtual screening and its application to GPCRs.
J Chem Inf Model. 2014 May 27;54(5):1433-50. doi: 10.1021/ci500062f. Epub 2014 May 1.
3
Comparative modeling and benchmarking data sets for human histone deacetylases and sirtuin families.
J Chem Inf Model. 2015 Feb 23;55(2):374-88. doi: 10.1021/ci5005515. Epub 2015 Feb 9.
4
Maximal Unbiased Benchmarking Data Sets for Human Chemokine Receptors and Comparative Analysis.
J Chem Inf Model. 2018 May 29;58(5):1104-1120. doi: 10.1021/acs.jcim.8b00004. Epub 2018 May 8.
5
How to benchmark methods for structure-based virtual screening of large compound libraries.
Methods Mol Biol. 2012;819:187-95. doi: 10.1007/978-1-61779-465-0_13.
6
7
Benchmarking Data Sets for the Evaluation of Virtual Ligand Screening Methods: Review and Perspectives.
J Chem Inf Model. 2015 Jul 27;55(7):1297-307. doi: 10.1021/acs.jcim.5b00090. Epub 2015 Jun 18.
8
Comprehensive investigation of selectivity landscape of glycogen synthase kinase-3 inhibitors.
J Biomol Struct Dyn. 2021 Apr;39(7):2318-2337. doi: 10.1080/07391102.2020.1747544. Epub 2020 Apr 7.
9
Evaluating the predictivity of virtual screening for ABL kinase inhibitors to hinder drug resistance.
Chem Biol Drug Des. 2013 Nov;82(5):506-19. doi: 10.1111/cbdd.12170. Epub 2013 Oct 1.

引用本文的文献

1
ColdstartCPI: Induced-fit theory-guided DTI predictive model with improved generalization performance.
Nat Commun. 2025 Jul 11;16(1):6436. doi: 10.1038/s41467-025-61745-7.
2
iScore: A ML-Based Scoring Function for De Novo Drug Discovery.
J Chem Inf Model. 2025 Mar 24;65(6):2759-2772. doi: 10.1021/acs.jcim.4c02192. Epub 2025 Mar 4.
3
An overview of recent advances and challenges in predicting compound-protein interaction (CPI).
Med Rev (2021). 2023 Oct 6;3(6):465-486. doi: 10.1515/mr-2023-0030. eCollection 2023 Dec.
4
Integrated Molecular Modeling and Machine Learning for Drug Design.
J Chem Theory Comput. 2023 Nov 14;19(21):7478-7495. doi: 10.1021/acs.jctc.3c00814. Epub 2023 Oct 26.
5
Protein-ligand binding affinity prediction with edge awareness and supervised attention.
iScience. 2022 Dec 28;26(1):105892. doi: 10.1016/j.isci.2022.105892. eCollection 2023 Jan 20.
6
Confidence bands and hypothesis tests for hit enrichment curves.
J Cheminform. 2022 Jul 28;14(1):50. doi: 10.1186/s13321-022-00629-0.
7
EMBER-Embedding Multiple Molecular Fingerprints for Virtual Screening.
Int J Mol Sci. 2022 Feb 15;23(4):2156. doi: 10.3390/ijms23042156.
8
Ligand-Based Virtual Screening Based on the Graph Edit Distance.
Int J Mol Sci. 2021 Nov 25;22(23):12751. doi: 10.3390/ijms222312751.
9
Three-Dimensional Convolutional Neural Networks and a Cross-Docked Data Set for Structure-Based Drug Design.
J Chem Inf Model. 2020 Sep 28;60(9):4200-4215. doi: 10.1021/acs.jcim.0c00411. Epub 2020 Sep 10.
10
Learning the Edit Costs of Graph Edit Distance Applied to Ligand-Based Virtual Screening.
Curr Top Med Chem. 2020;20(18):1582-1592. doi: 10.2174/1568026620666200603122000.

本文引用的文献

1
An unbiased method to build benchmarking sets for ligand-based virtual screening and its application to GPCRs.
J Chem Inf Model. 2014 May 27;54(5):1433-50. doi: 10.1021/ci500062f. Epub 2014 May 1.
2
NRLiSt BDB, the manually curated nuclear receptors ligands and structures benchmarking database.
J Med Chem. 2014 Apr 10;57(7):3117-25. doi: 10.1021/jm500132p. Epub 2014 Mar 25.
3
New and emerging HDAC inhibitors for cancer treatment.
J Clin Invest. 2014 Jan;124(1):30-9. doi: 10.1172/JCI69738. Epub 2014 Jan 2.
5
An integrated virtual screening approach for VEGFR-2 inhibitors.
J Chem Inf Model. 2013 Dec 23;53(12):3163-77. doi: 10.1021/ci400429g. Epub 2013 Dec 3.
6
Ligand pose and orientational sampling in molecular docking.
PLoS One. 2013 Oct 1;8(10):e75992. doi: 10.1371/journal.pone.0075992. eCollection 2013.
7
Boosting virtual screening enrichments with data fusion: coalescing hits from two-dimensional fingerprints, shape, and docking.
J Chem Inf Model. 2013 Jul 22;53(7):1531-42. doi: 10.1021/ci300463g. Epub 2013 Jul 3.
9
SFCscore(RF): a random forest-based scoring function for improved affinity prediction of protein-ligand complexes.
J Chem Inf Model. 2013 Aug 26;53(8):1923-33. doi: 10.1021/ci400120b. Epub 2013 Jun 10.
10
Assessing the performance of 3D pharmacophore models in virtual screening: how good are they?
Curr Top Med Chem. 2013;13(9):1127-38. doi: 10.2174/1568026611313090010.

文献AI研究员

20分钟写一篇综述,助力文献阅读效率提升50倍。

立即体验

用中文搜PubMed

大模型驱动的PubMed中文搜索引擎

马上搜索

文档翻译

学术文献翻译模型,支持多种主流文档格式。

立即体验