Suppr超能文献

构建一个基准数据集,用于利用公开的高通量筛选数据评估基于配体和结构的虚拟筛选。

Toward a benchmarking data set able to evaluate ligand- and structure-based virtual screening using public HTS data.

作者信息

Lindh Martin, Svensson Fredrik, Schaal Wesley, Zhang Jin, Sköld Christian, Brandt Peter, Karlén Anders

机构信息

Organic Pharmaceutical Chemistry, Department of Medicinal Chemistry, Uppsala University , Biomedical Centre, Box 574, SE- 751 23 Uppsala, Sweden.

出版信息

J Chem Inf Model. 2015 Feb 23;55(2):343-53. doi: 10.1021/ci5005465. Epub 2015 Jan 28.

Abstract

Virtual screening has the potential to accelerate and reduce costs of probe development and drug discovery. To develop and benchmark virtual screening methods, validation data sets are commonly used. Over the years, such data sets have been constructed to overcome the problems of analogue bias and artificial enrichment. With the rapid growth of public domain databases containing high-throughput screening data, such as the PubChem BioAssay database, there is an increased possibility to use such data for validation. In this study, we identify PubChem data sets suitable for validation of both structure- and ligand-based virtual screening methods. To achieve this, high-throughput screening data for which a crystal structure of the bioassay target was available in the PDB were identified. Thereafter, the data sets were inspected to identify structures and data suitable for use in validation studies. In this work, we present seven data sets (MMP13, DUSP3, PTPN22, EPHX2, CTDSP1, MAPK10, and CDK5) compiled using this method. In the seven data sets, the number of active compounds varies between 19 and 369 and the number of inactive compounds between 59 405 and 337 634. This gives a higher ratio of the number of inactive to active compounds than what is found in most benchmark data sets. We have also evaluated the screening performance using docking and 3D shape similarity with default settings. To characterize the data sets, we used physicochemical similarity and 2D fingerprint searches. We envision that these data sets can be a useful complement to current data sets used for method evaluation.

摘要

虚拟筛选有潜力加速并降低探针开发和药物发现的成本。为了开发和评估虚拟筛选方法,通常会使用验证数据集。多年来,构建这样的数据集是为了克服类似物偏差和人为富集的问题。随着包含高通量筛选数据的公共领域数据库(如PubChem生物测定数据库)的快速增长,利用此类数据进行验证的可能性也在增加。在本研究中,我们确定了适用于基于结构和基于配体的虚拟筛选方法验证的PubChem数据集。为实现这一目标,我们识别了在蛋白质数据银行(PDB)中有生物测定靶点晶体结构的高通量筛选数据。此后,对这些数据集进行检查,以确定适用于验证研究的结构和数据。在这项工作中,我们展示了使用这种方法编制的七个数据集(基质金属蛋白酶13、双特异性磷酸酶3、蛋白酪氨酸磷酸酶非受体型22、环氧化物水解酶2、CTD小磷酸酶1、丝裂原活化蛋白激酶10和细胞周期蛋白依赖性激酶5)。在这七个数据集中,活性化合物的数量在19至369之间,非活性化合物的数量在59405至337634之间。这使得非活性与活性化合物数量的比例高于大多数基准数据集中的比例。我们还使用默认设置通过对接和三维形状相似性评估了筛选性能。为了表征这些数据集,我们使用了物理化学相似性和二维指纹搜索。我们设想这些数据集可以成为当前用于方法评估的数据集的有益补充。

文献AI研究员

20分钟写一篇综述,助力文献阅读效率提升50倍。

立即体验

用中文搜PubMed

大模型驱动的PubMed中文搜索引擎

马上搜索

文档翻译

学术文献翻译模型,支持多种主流文档格式。

立即体验