Suppr超能文献

基于配体的虚拟高通量筛选与 PubChem 数据库的基准测试。

Benchmarking ligand-based virtual High-Throughput Screening with the PubChem database.

机构信息

Department of Chemistry, Pharmacology, and Biomedical Informatics, Center for Structural Biology, Institute of Chemical Biology, Vanderbilt University, Nashville, TN 37232, USA.

出版信息

Molecules. 2013 Jan 8;18(1):735-56. doi: 10.3390/molecules18010735.

Abstract

With the rapidly increasing availability of High-Throughput Screening (HTS) data in the public domain, such as the PubChem database, methods for ligand-based computer-aided drug discovery (LB-CADD) have the potential to accelerate and reduce the cost of probe development and drug discovery efforts in academia. We assemble nine data sets from realistic HTS campaigns representing major families of drug target proteins for benchmarking LB-CADD methods. Each data set is public domain through PubChem and carefully collated through confirmation screens validating active compounds. These data sets provide the foundation for benchmarking a new cheminformatics framework BCL::ChemInfo, which is freely available for non-commercial use. Quantitative structure activity relationship (QSAR) models are built using Artificial Neural Networks (ANNs), Support Vector Machines (SVMs), Decision Trees (DTs), and Kohonen networks (KNs). Problem-specific descriptor optimization protocols are assessed including Sequential Feature Forward Selection (SFFS) and various information content measures. Measures of predictive power and confidence are evaluated through cross-validation, and a consensus prediction scheme is tested that combines orthogonal machine learning algorithms into a single predictor. Enrichments ranging from 15 to 101 for a TPR cutoff of 25% are observed.

摘要

随着高通量筛选 (HTS) 数据在公共领域的可用性迅速增加,例如 PubChem 数据库,基于配体的计算机辅助药物发现 (LB-CADD) 方法有可能加速和降低学术界探针开发和药物发现工作的成本。我们从代表药物靶标蛋白主要家族的现实 HTS 活动中组装了九个数据集,用于基准 LB-CADD 方法。每个数据集都是通过 PubChem 在公共领域的,并且通过确认筛选进行了仔细整理,这些筛选验证了活性化合物的存在。这些数据集为基准测试新的化学信息学框架 BCL::ChemInfo 提供了基础,该框架可免费用于非商业用途。使用人工神经网络 (ANNs)、支持向量机 (SVMs)、决策树 (DTs) 和 Kohonen 网络 (KNs) 构建定量构效关系 (QSAR) 模型。评估了特定于问题的描述符优化协议,包括顺序特征前向选择 (SFFS) 和各种信息内容度量。通过交叉验证评估了预测能力和置信度的度量,并测试了一种共识预测方案,该方案将正交机器学习算法组合成一个单一的预测器。对于 TPR 截止值为 25%,观察到从 15 到 101 的富集。

相似文献

5
Getting the most out of PubChem for virtual screening.充分利用 PubChem 进行虚拟筛选。
Expert Opin Drug Discov. 2016 Sep;11(9):843-55. doi: 10.1080/17460441.2016.1216967. Epub 2016 Aug 5.
9
Advances with support vector machines for novel drug discovery.支持向量机在新药发现中的进展。
Expert Opin Drug Discov. 2019 Jan;14(1):23-33. doi: 10.1080/17460441.2019.1549033. Epub 2018 Nov 29.

引用本文的文献

6
Protein-Ligand Docking in the Machine-Learning Era.蛋白质-配体对接在机器学习时代。
Molecules. 2022 Jul 18;27(14):4568. doi: 10.3390/molecules27144568.

本文引用的文献

7
Counterpropagation networks.对向传播网络
Appl Opt. 1987 Dec 1;26(23):4979-83. doi: 10.1364/AO.26.004979.

文献AI研究员

20分钟写一篇综述,助力文献阅读效率提升50倍。

立即体验

用中文搜PubMed

大模型驱动的PubMed中文搜索引擎

马上搜索

文档翻译

学术文献翻译模型,支持多种主流文档格式。

立即体验