Suppr超能文献

来自PubChem数据库的高通量筛选分析数据集。

High-Throughput Screening Assay Datasets from the PubChem Database.

作者信息

Butkiewicz Mariusz, Wang Yanli, Bryant Stephen H, Lowe Edward W, Weaver David C, Meiler Jens

机构信息

Department of Chemistry, Pharmacology and Biomedical Informatics, Center for Structural Biology, Institute of Chemical Biology, Vanderbilt University, Nashville, USA.

National Institutes of Health, National Center for Biotechnology Information, US National Library of Medicine, Bethesda, USA.

出版信息

Chem Inform. 2017;3(1). Epub 2017 Apr 26.

Abstract

Availability of high-throughput screening (HTS) data in the public domain offers great potential to foster development of ligand-based computer-aided drug discovery (LB-CADD) methods crucial for drug discovery efforts in academia and industry. LB-CADD method development depends on high-quality HTS assay data, i.e., datasets that contain both active and inactive compounds. These active compounds are hits from primary screens that have been tested in concentration-response experiments and where the target-specificity of the hits has been validated through suitable secondary screening experiments. Publicly available HTS repositories such as PubChem often provide such data in a convoluted way: compounds that are classified as inactive need to be extracted from the primary screening record. However, compounds classified as active in the primary screening record are not suitable as a set of active compounds for LB-CADD experiments due to high false-positive rate. A suitable set of actives can be derived by carefully analysing results in often up to five or more assays that are used to confirm and classify the activity of compounds. These assays, in part, build on each other. However, often not all hit compounds from the previous screen have been tested. Sometimes a compound can be classified as 'active', though its meaning is 'inactive' on the target of interest as it is 'active' on a different target protein. Here, a curation process of hierarchically related confirmatory screens is illustrated based on two specifically chosen protein use-cases. The subsequent re-upload procedure into PubChem is described for the findings of those two scenarios. Further, we provide nine publicly accessible high quality datasets for future LB-CADD method development that provide a common baseline for comparison of future methods to the scientific community. We also provide a protocol researchers can follow to upload additional datasets for benchmarking.

摘要

公共领域中高通量筛选(HTS)数据的可用性为促进基于配体的计算机辅助药物发现(LB-CADD)方法的发展提供了巨大潜力,这些方法对学术界和工业界的药物发现工作至关重要。LB-CADD方法的开发依赖于高质量的HTS测定数据,即包含活性和非活性化合物的数据集。这些活性化合物是来自初筛的命中物,已在浓度响应实验中进行了测试,并且命中物的靶标特异性已通过适当的二次筛选实验得到验证。诸如PubChem之类的公开可用的HTS存储库通常以复杂的方式提供此类数据:需要从初筛记录中提取分类为非活性的化合物。然而,由于高假阳性率,在初筛记录中分类为活性的化合物不适合作为LB-CADD实验的一组活性化合物。可以通过仔细分析通常多达五个或更多用于确认和分类化合物活性的测定结果来获得一组合适的活性物质。这些测定部分相互依赖。然而,通常并非来自先前筛选的所有命中化合物都经过了测试。有时一种化合物可能被分类为“活性”,尽管其含义是在感兴趣的靶标上“非活性”,因为它在不同的靶蛋白上是“活性”的。在此,基于两个特别选择的蛋白质用例说明了分层相关确认性筛选的整理过程。针对这两种情况的结果描述了随后重新上传到PubChem的过程。此外,我们提供了九个可公开访问的高质量数据集,用于未来LB-CADD方法的开发,为科学界将未来方法进行比较提供了共同的基线。我们还提供了研究人员可以遵循的协议,以便上传额外的数据集进行基准测试。

https://cdn.ncbi.nlm.nih.gov/pmc/blobs/4e6f/5962024/d53622928595/nihms936862f1.jpg

文献AI研究员

20分钟写一篇综述,助力文献阅读效率提升50倍。

立即体验

用中文搜PubMed

大模型驱动的PubMed中文搜索引擎

马上搜索

文档翻译

学术文献翻译模型,支持多种主流文档格式。

立即体验