构建一个基准数据集，用于利用公开的高通量筛选数据评估基于配体和结构的虚拟筛选。

Toward a benchmarking data set able to evaluate ligand- and structure-based virtual screening using public HTS data.

作者信息

Lindh Martin, Svensson Fredrik, Schaal Wesley, Zhang Jin, Sköld Christian, Brandt Peter, Karlén Anders

机构信息

Organic Pharmaceutical Chemistry, Department of Medicinal Chemistry, Uppsala University , Biomedical Centre, Box 574, SE- 751 23 Uppsala, Sweden.

出版信息

J Chem Inf Model. 2015 Feb 23;55(2):343-53. doi: 10.1021/ci5005465. Epub 2015 Jan 28.

DOI:10.1021/ci5005465

PMID:25564966

Abstract

Virtual screening has the potential to accelerate and reduce costs of probe development and drug discovery. To develop and benchmark virtual screening methods, validation data sets are commonly used. Over the years, such data sets have been constructed to overcome the problems of analogue bias and artificial enrichment. With the rapid growth of public domain databases containing high-throughput screening data, such as the PubChem BioAssay database, there is an increased possibility to use such data for validation. In this study, we identify PubChem data sets suitable for validation of both structure- and ligand-based virtual screening methods. To achieve this, high-throughput screening data for which a crystal structure of the bioassay target was available in the PDB were identified. Thereafter, the data sets were inspected to identify structures and data suitable for use in validation studies. In this work, we present seven data sets (MMP13, DUSP3, PTPN22, EPHX2, CTDSP1, MAPK10, and CDK5) compiled using this method. In the seven data sets, the number of active compounds varies between 19 and 369 and the number of inactive compounds between 59 405 and 337 634. This gives a higher ratio of the number of inactive to active compounds than what is found in most benchmark data sets. We have also evaluated the screening performance using docking and 3D shape similarity with default settings. To characterize the data sets, we used physicochemical similarity and 2D fingerprint searches. We envision that these data sets can be a useful complement to current data sets used for method evaluation.

摘要

虚拟筛选有潜力加速并降低探针开发和药物发现的成本。为了开发和评估虚拟筛选方法，通常会使用验证数据集。多年来，构建这样的数据集是为了克服类似物偏差和人为富集的问题。随着包含高通量筛选数据的公共领域数据库（如PubChem生物测定数据库）的快速增长，利用此类数据进行验证的可能性也在增加。在本研究中，我们确定了适用于基于结构和基于配体的虚拟筛选方法验证的PubChem数据集。为实现这一目标，我们识别了在蛋白质数据银行（PDB）中有生物测定靶点晶体结构的高通量筛选数据。此后，对这些数据集进行检查，以确定适用于验证研究的结构和数据。在这项工作中，我们展示了使用这种方法编制的七个数据集（基质金属蛋白酶13、双特异性磷酸酶3、蛋白酪氨酸磷酸酶非受体型22、环氧化物水解酶2、CTD小磷酸酶1、丝裂原活化蛋白激酶10和细胞周期蛋白依赖性激酶5）。在这七个数据集中，活性化合物的数量在19至369之间，非活性化合物的数量在59405至337634之间。这使得非活性与活性化合物数量的比例高于大多数基准数据集中的比例。我们还使用默认设置通过对接和三维形状相似性评估了筛选性能。为了表征这些数据集，我们使用了物理化学相似性和二维指纹搜索。我们设想这些数据集可以成为当前用于方法评估的数据集的有益补充。

相似文献

Toward a benchmarking data set able to evaluate ligand- and structure-based virtual screening using public HTS data.构建一个基准数据集，用于利用公开的高通量筛选数据评估基于配体和结构的虚拟筛选。

J Chem Inf Model. 2015 Feb 23;55(2):343-53. doi: 10.1021/ci5005465. Epub 2015 Jan 28.

Importance of the pharmacological profile of the bound ligand in enrichment on nuclear receptors: toward the use of experimentally validated decoy ligands.结合配体的药理学特征在核受体富集中的重要性：走向使用经过实验验证的诱饵配体。

J Chem Inf Model. 2014 Oct 27;54(10):2915-44. doi: 10.1021/ci500305c. Epub 2014 Oct 9.

Benchmarking ligand-based virtual High-Throughput Screening with the PubChem database.基于配体的虚拟高通量筛选与 PubChem 数据库的基准测试。

Molecules. 2013 Jan 8;18(1):735-56. doi: 10.3390/molecules18010735.

Toward fully automated high performance computing drug discovery: a massively parallel virtual screening pipeline for docking and molecular mechanics/generalized Born surface area rescoring to improve enrichment.迈向全自动高性能计算药物发现：一种大规模并行虚拟筛选管道，用于对接和分子力学/广义 Born 表面面积再评分，以提高富集度。

J Chem Inf Model. 2014 Jan 27;54(1):324-37. doi: 10.1021/ci4005145. Epub 2014 Jan 3.

Benchmarking Data Sets from PubChem BioAssay Data: Current Scenario and Room for Improvement.从 PubChem 生物测定数据中进行基准测试数据集：当前情况和改进空间。

Int J Mol Sci. 2020 Jun 19;21(12):4380. doi: 10.3390/ijms21124380.

Heterogeneous classifier fusion for ligand-based virtual screening: or, how decision making by committee can be a good thing.基于配体的虚拟筛选中的异类分类器融合：或者，委员会决策如何成为一件好事。

J Chem Inf Model. 2013 Nov 25;53(11):2829-36. doi: 10.1021/ci400466r. Epub 2013 Nov 14.

Structure-based virtual screening of MT2 melatonin receptor: influence of template choice and structural refinement.基于结构的 MT2 褪黑素受体虚拟筛选：模板选择和结构精修的影响。

J Chem Inf Model. 2013 Apr 22;53(4):821-35. doi: 10.1021/ci4000147. Epub 2013 Apr 9.

REPROVIS-DB: a benchmark system for ligand-based virtual screening derived from reproducible prospective applications.REPROVIS-DB：一个基于配体的虚拟筛选基准系统，源自可重现的前瞻性应用。

J Chem Inf Model. 2011 Oct 24;51(10):2467-73. doi: 10.1021/ci200309j. Epub 2011 Sep 26.

LIT-PCBA: An Unbiased Data Set for Machine Learning and Virtual Screening.LIT-PCBA：用于机器学习和虚拟筛选的无偏数据集。

J Chem Inf Model. 2020 Sep 28;60(9):4263-4273. doi: 10.1021/acs.jcim.0c00155. Epub 2020 Apr 23.

J Chem Inf Model. 2013 Mar 25;53(3):692-703. doi: 10.1021/ci300607r. Epub 2013 Mar 5.

引用本文的文献

Benchmarking Data Sets from PubChem BioAssay Data: Current Scenario and Room for Improvement.从 PubChem 生物测定数据中进行基准测试数据集：当前情况和改进空间。

Int J Mol Sci. 2020 Jun 19;21(12):4380. doi: 10.3390/ijms21124380.

Targeting the C-Terminal Domain Small Phosphatase 1.靶向C端结构域小磷酸酶1

Life (Basel). 2020 May 8;10(5):57. doi: 10.3390/life10050057.

Discovery of Potent Disheveled/Dvl Inhibitors Using Virtual Screening Optimized With NMR-Based Docking Performance Index.利用基于核磁共振对接性能指标优化的虚拟筛选发现强效的Disheveled/Dvl抑制剂。

Front Pharmacol. 2018 Sep 5;9:983. doi: 10.3389/fphar.2018.00983. eCollection 2018.

Rescoring of docking poses under Occam's Razor: are there simpler solutions?奥卡姆剃刀下对接构象的重评分：是否存在更简单的解决方案？

J Comput Aided Mol Des. 2018 Sep;32(9):877-888. doi: 10.1007/s10822-018-0155-5. Epub 2018 Sep 1.

Efficient iterative virtual screening with Apache Spark and conformal prediction.使用Apache Spark和共形预测进行高效迭代虚拟筛选。

J Cheminform. 2018 Mar 1;10(1):8. doi: 10.1186/s13321-018-0265-z.

Prediction of Protein-compound Binding Energies from Known Activity Data: Docking-score-based Method and its Applications.从已知活性数据预测蛋白质-化合物结合能：基于对接评分的方法及其应用。

Mol Inform. 2018 Jul;37(6-7):e1700120. doi: 10.1002/minf.201700120. Epub 2018 Feb 14.

VB-MK-LMF: fusion of drugs, targets and interactions using variational Bayesian multiple kernel logistic matrix factorization.VB-MK-LMF：使用变分贝叶斯多核逻辑矩阵分解融合药物、靶点及相互作用

BMC Bioinformatics. 2017 Oct 4;18(1):440. doi: 10.1186/s12859-017-1845-z.

Quantitative Structure-activity Relationship (QSAR) Models for Docking Score Correction.定量构效关系（QSAR）模型在对接评分修正中的应用。

Mol Inform. 2017 Jan;36(1-2). doi: 10.1002/minf.201600013. Epub 2016 Apr 29.

In Silico Exploration for Novel Type-I Inhibitors of Tie-2/TEK: The Performance of Different Selection Strategy in Selecting Virtual Screening Candidates.基于计算机的新型 Tie-2/TEK 型 I 抑制剂的探索：不同筛选策略在虚拟筛选候选物选择中的性能。

Sci Rep. 2016 Nov 23;6:37628. doi: 10.1038/srep37628.

Getting the most out of PubChem for virtual screening.充分利用 PubChem 进行虚拟筛选。

Expert Opin Drug Discov. 2016 Sep;11(9):843-55. doi: 10.1080/17460441.2016.1216967. Epub 2016 Aug 5.

文献AI研究员

20分钟写一篇综述，助力文献阅读效率提升50倍。

立即体验

用中文搜PubMed

大模型驱动的PubMed中文搜索引擎

马上搜索

文档翻译

学术文献翻译模型，支持多种主流文档格式。

立即体验

构建一个基准数据集，用于利用公开的高通量筛选数据评估基于配体和结构的虚拟筛选。

Toward a benchmarking data set able to evaluate ligand- and structure-based virtual screening using public HTS data.

作者信息

机构信息

出版信息

相似文献

引用本文的文献

文献AI研究员

用中文搜PubMed

文档翻译

Suppr 超能文献

相似文献

引用本文的文献