利用迭代筛选从初始失活物中发现高活性分子。

Discovering Highly Potent Molecules from an Initial Set of Inactives Using Iterative Screening.

机构信息

Centre for Molecular Informatics, Department of Chemistry , University of Cambridge , Lensfield Road , Cambridge CB2 1EW , United Kingdom.

Centre for Medical Image Computing, Department of Computer Science , UCL , London WC1E 6BT , United Kingdom.

出版信息

J Chem Inf Model. 2018 Sep 24;58(9):2000-2014. doi: 10.1021/acs.jcim.8b00376. Epub 2018 Sep 10.

DOI:10.1021/acs.jcim.8b00376

PMID:30130102

Abstract

The versatility of similarity searching and quantitative structure-activity relationships to model the activity of compound sets within given bioactivity ranges (i.e., interpolation) is well established. However, their relative performance in the common scenario in early stage drug discovery where lots of inactive data but no active data points are available (i.e., extrapolation from the low-activity to the high-activity range) has not been thoroughly examined yet. To this aim, we have designed an iterative virtual screening strategy which was evaluated on 25 diverse bioactivity data sets from ChEMBL. We benchmark the efficiency of random forest (RF), multiple linear regression, ridge regression, similarity searching, and random selection of compounds to identify a highly active molecule in the test set among a large number of low-potency compounds. We use the number of iterations required to find this active molecule to evaluate the performance of each experimental setup. We show that linear and ridge regression often outperform RF and similarity searching, reducing the number of iterations to find an active compound by a factor of 2 or more. Even simple regression methods seem better able to extrapolate to high-bioactivity ranges than RF, which only provides output values in the range covered by the training set. In addition, examination of the scaffold diversity in the data sets used shows that in some cases similarity searching and RF require two times as many iterations as random selection depending on the chemical space covered in the initial training data. Lastly, we show using bioactivity data for COX-1 and COX-2 that our framework can be extended to multitarget drug discovery, where compounds are selected by concomitantly considering their activity against multiple targets. Overall, this study provides an approach for iterative screening where only inactive data are present in early stages of drug discovery in order to discover highly potent compounds and the best experimental set up in which to do so.

摘要

相似性搜索和定量构效关系在给定的生物活性范围内模拟化合物集的活性（即内插）的多功能性已得到充分证实。然而，它们在早期药物发现中常见的情况下（即在没有活性数据点但有大量非活性数据的情况下）的相对性能尚未得到彻底研究。为此，我们设计了一种迭代虚拟筛选策略，该策略在来自 ChEMBL 的 25 个不同的生物活性数据集上进行了评估。我们使用随机森林（RF）、多元线性回归、岭回归、相似性搜索和化合物的随机选择来评估效率，以在大量低活性化合物中识别测试集中的高活性分子。我们使用找到这种活性分子所需的迭代次数来评估每个实验设置的性能。我们表明，线性和岭回归通常优于 RF 和相似性搜索，将找到活性化合物所需的迭代次数减少了 2 倍或更多。即使是简单的回归方法，似乎也比仅在训练集覆盖范围内提供输出值的 RF 更能外推到高生物活性范围。此外，对所使用的数据集的支架多样性的检查表明，在某些情况下，相似性搜索和 RF 所需的迭代次数是随机选择的两倍，具体取决于初始训练数据中涵盖的化学空间。最后，我们使用 COX-1 和 COX-2 的生物活性数据表明，我们的框架可以扩展到多靶标药物发现，其中通过同时考虑化合物对多个靶标的活性来选择化合物。总的来说，这项研究提供了一种迭代筛选的方法，在药物发现的早期阶段只有非活性数据，以便发现高活性化合物和最佳的实验设置。

相似文献

Discovering Highly Potent Molecules from an Initial Set of Inactives Using Iterative Screening.利用迭代筛选从初始失活物中发现高活性分子。

J Chem Inf Model. 2018 Sep 24;58(9):2000-2014. doi: 10.1021/acs.jcim.8b00376. Epub 2018 Sep 10.

Evaluation of QSAR Equations for Virtual Screening.QSAR 方程在虚拟筛选中的评估。

Int J Mol Sci. 2020 Oct 22;21(21):7828. doi: 10.3390/ijms21217828.

Discovery of multitarget-directed ligands against Alzheimer's disease through systematic prediction of chemical-protein interactions.通过化学-蛋白质相互作用的系统预测发现抗阿尔茨海默病的多靶点导向配体。

J Chem Inf Model. 2015 Jan 26;55(1):149-64. doi: 10.1021/ci500574n. Epub 2015 Jan 13.

Comparing the Influence of Simulated Experimental Errors on 12 Machine Learning Algorithms in Bioactivity Modeling Using 12 Diverse Data Sets.比较模拟实验误差对使用 12 个不同数据集的生物活性建模中 12 种机器学习算法的影响。

J Chem Inf Model. 2015 Jul 27;55(7):1413-25. doi: 10.1021/acs.jcim.5b00101. Epub 2015 Jun 18.

How diverse are diversity assessment methods? A comparative analysis and benchmarking of molecular descriptor space.多样性评估方法有哪些差异？分子描述符空间的比较分析和基准测试。

J Chem Inf Model. 2014 Jan 27;54(1):230-42. doi: 10.1021/ci400469u. Epub 2013 Dec 13.

J Chem Inf Model. 2013 Jul 22;53(7):1613-9. doi: 10.1021/ci4003206. Epub 2013 Jul 9.

Data-Driven Derivation of an "Informer Compound Set" for Improved Selection of Active Compounds in High-Throughput Screening.基于数据驱动的“信息化合物集”推导，提高高通量筛选中活性化合物的选择。

J Chem Inf Model. 2016 Sep 26;56(9):1622-30. doi: 10.1021/acs.jcim.6b00244. Epub 2016 Aug 16.

WDL-RF: predicting bioactivities of ligand molecules acting with G protein-coupled receptors by combining weighted deep learning and random forest.WDL-RF：通过结合加权深度学习和随机森林预测与 G 蛋白偶联受体相互作用的配体分子的生物活性。

Bioinformatics. 2018 Jul 1;34(13):2271-2282. doi: 10.1093/bioinformatics/bty070.

Critical comparison of virtual screening methods against the MUV data set.针对MUV数据集的虚拟筛选方法的关键比较。

J Chem Inf Model. 2009 Oct;49(10):2168-78. doi: 10.1021/ci900249b.

A decision-theoretic approach to the evaluation of machine learning algorithms in computational drug discovery.一种基于决策理论的计算药物发现中机器学习算法评估方法。

Bioinformatics. 2019 Nov 1;35(22):4656-4663. doi: 10.1093/bioinformatics/btz293.

引用本文的文献

MATEO: intermolecular α-amidoalkylation theoretical enantioselectivity optimization. Online tool for selection and design of chiral catalysts and products.马泰奥：分子间α-酰胺烷基化理论对映选择性优化。用于手性催化剂和产物选择与设计的在线工具。

J Cheminform. 2024 Jan 23;16(1):9. doi: 10.1186/s13321-024-00802-7.

Iterative machine learning-based chemical similarity search to identify novel chemical inhibitors.基于迭代机器学习的化学相似性搜索以识别新型化学抑制剂。

J Cheminform. 2023 Sep 23;15(1):86. doi: 10.1186/s13321-023-00760-6.

Limits of Prediction for Machine Learning in Drug Discovery.药物发现中机器学习的预测局限性。

Front Pharmacol. 2022 Mar 10;13:832120. doi: 10.3389/fphar.2022.832120. eCollection 2022.

DeepReac+: deep active learning for quantitative modeling of organic chemical reactions.DeepReac+：用于有机化学反应定量建模的深度主动学习

Chem Sci. 2021 Oct 9;12(43):14459-14472. doi: 10.1039/d1sc02087k. eCollection 2021 Nov 10.

A hybrid framework for improving uncertainty quantification in deep learning-based QSAR regression modeling.一种用于改进基于深度学习的定量构效关系回归建模中不确定性量化的混合框架。

J Cheminform. 2021 Sep 20;13(1):69. doi: 10.1186/s13321-021-00551-x.

Machine learning models for classification tasks related to drug safety.用于药物安全相关分类任务的机器学习模型。

Mol Divers. 2021 Aug;25(3):1409-1424. doi: 10.1007/s11030-021-10239-x. Epub 2021 Jun 10.

J Cheminform. 2021 Apr 23;13(1):32. doi: 10.1186/s13321-021-00505-3.

Reply to "Missed opportunities in large scale comparison of QSAR and conformal prediction methods and their applications in drug discovery".对“QSAR与共形预测方法的大规模比较及其在药物发现中的应用中的错失机会”的回复

J Cheminform. 2019 Nov 6;11(1):64. doi: 10.1186/s13321-019-0388-x.

Bioactivity Comparison across Multiple Machine Learning Algorithms Using over 5000 Datasets for Drug Discovery.利用 5000 多个数据集进行药物发现的多种机器学习算法的生物活性比较。

Mol Pharm. 2021 Jan 4;18(1):403-415. doi: 10.1021/acs.molpharmaceut.0c01013. Epub 2020 Dec 16.

Changing the HTS Paradigm: AI-Driven Iterative Screening for Hit Finding.改变高通量筛选范式：人工智能驱动的迭代筛选以寻找命中。

SLAS Discov. 2021 Feb;26(2):257-262. doi: 10.1177/2472555220949495. Epub 2020 Aug 18.

文献AI研究员

20分钟写一篇综述，助力文献阅读效率提升50倍。

立即体验

用中文搜PubMed

大模型驱动的PubMed中文搜索引擎

马上搜索

文档翻译

学术文献翻译模型，支持多种主流文档格式。

立即体验

利用迭代筛选从初始失活物中发现高活性分子。

Discovering Highly Potent Molecules from an Initial Set of Inactives Using Iterative Screening.

机构信息

出版信息

相似文献

引用本文的文献

文献AI研究员

用中文搜PubMed

文档翻译

Suppr 超能文献

相似文献

引用本文的文献