Suppr
超能文献

QSAR 方程在虚拟筛选中的评估。

Evaluation of QSAR Equations for Virtual Screening.

机构信息

Department of Chemistry, Bar-Ilan University, Ramat-Gan 5290002, Israel.

出版信息

Int J Mol Sci. 2020 Oct 22;21(21):7828. doi: 10.3390/ijms21217828.

DOI:10.3390/ijms21217828

PMID:33105703

原文链接:https://pmc.ncbi.nlm.nih.gov/articles/PMC7672587/

Abstract

Quantitative Structure Activity Relationship (QSAR) models can inform on the correlation between activities and structure-based molecular descriptors. This information is important for the understanding of the factors that govern molecular properties and for designing new compounds with favorable properties. Due to the large number of calculate-able descriptors and consequently, the much larger number of descriptors combinations, the derivation of QSAR models could be treated as an optimization problem. For continuous responses, metrics which are typically being optimized in this process are related to model performances on the training set, for example, R2 and QCV2. Similar metrics, calculated on an external set of data (e.g., QF1/F2/F32), are used to evaluate the performances of the final models. A common theme of these metrics is that they are context -" ignorant". In this work we propose that QSAR models should be evaluated based on their intended usage. More specifically, we argue that QSAR models developed for Virtual Screening (VS) should be derived and evaluated using a virtual screening-aware metric, e.g., an enrichment-based metric. To demonstrate this point, we have developed 21 Multiple Linear Regression (MLR) models for seven targets (three models per target), evaluated them first on validation sets and subsequently tested their performances on two additional test sets constructed to mimic small-scale virtual screening campaigns. As expected, we found no correlation between model performances evaluated by "classical" metrics, e.g., R2 and QF1/F2/F32 and the number of active compounds picked by the models from within a pool of random compounds. In particular, in some cases models with favorable R2 and/or QF1/F2/F32 values were unable to pick a single active compound from within the pool whereas in other cases, models with poor R2 and/or QF1/F2/F32 values performed well in the context of virtual screening. We also found no significant correlation between the number of active compounds correctly identified by the models in the training, validation and test sets. Next, we have developed a new algorithm for the derivation of MLR models by optimizing an enrichment-based metric and tested its performances on the same datasets. We found that the best models derived in this manner showed, in most cases, much more consistent results across the training, validation and test sets and outperformed the corresponding MLR models in most virtual screening tests. Finally, we demonstrated that when tested as binary classifiers, models derived for the same targets by the new algorithm outperformed Random Forest (RF) and Support Vector Machine (SVM)-based models across training/validation/test sets, in most cases. We attribute the better performances of the Enrichment Optimizer Algorithm (EOA) models in VS to better handling of inactive random compounds. Optimizing an enrichment-based metric is therefore a promising strategy for the derivation of QSAR models for classification and virtual screening.

摘要

定量构效关系（QSAR）模型可以提供关于活性与基于结构的分子描述符之间相关性的信息。这些信息对于理解控制分子性质的因素以及设计具有有利性质的新化合物非常重要。由于可计算的描述符数量众多，因此描述符组合的数量也大大增加，因此 QSAR 模型的推导可以被视为一个优化问题。对于连续响应，在此过程中通常优化的指标与模型在训练集上的性能有关，例如 R2 和 QCV2。使用外部数据集（例如，QF1/F2/F32）计算的类似指标用于评估最终模型的性能。这些指标的一个共同主题是它们是上下文不可知的。在这项工作中，我们提出 QSAR 模型应该根据其预期用途进行评估。更具体地说，我们认为，为虚拟筛选（VS）开发的 QSAR 模型应该使用虚拟筛选感知指标（例如基于富集的指标）进行推导和评估。为了证明这一点，我们已经为七个目标（每个目标三个模型）开发了 21 个多元线性回归（MLR）模型，首先在验证集上进行评估，然后在两个额外的测试集上测试其性能，这些测试集旨在模拟小规模虚拟筛选活动。不出所料，我们发现模型性能评估的经典指标（例如 R2 和 QF1/F2/F32）与模型从随机化合物库中挑选的活性化合物数量之间没有相关性。特别是，在某些情况下，具有有利 R2 和/或 QF1/F2/F32 值的模型无法从库中挑选出一个活性化合物，而在其他情况下，具有较差 R2 和/或 QF1/F2/F32 值的模型在虚拟筛选的背景下表现良好。我们还发现模型在训练集、验证集和测试集中正确识别的活性化合物数量之间没有显著相关性。接下来，我们开发了一种用于通过优化基于富集的指标来推导 MLR 模型的新算法，并在相同的数据集上测试了其性能。我们发现，以这种方式推导的最佳模型在大多数情况下在训练集、验证集和测试集之间表现出更加一致的结果，并且在大多数虚拟筛选测试中表现优于相应的 MLR 模型。最后，我们证明了当作为二进制分类器进行测试时，通过新算法为同一目标推导的模型在大多数情况下都优于随机森林（RF）和支持向量机（SVM）模型。我们将 EOA 模型在 VS 中的更好性能归因于更好地处理无效的随机化合物。因此，优化基于富集的指标是用于推导分类和虚拟筛选的 QSAR 模型的有前途的策略。

https://cdn.ncbi.nlm.nih.gov/pmc/blobs/83c4/7672587/8983cb8c4b31/ijms-21-07828-g001.jpg

相似文献

Evaluation of QSAR Equations for Virtual Screening.

Int J Mol Sci. 2020 Oct 22;21(21):7828. doi: 10.3390/ijms21217828.

A Comparison between Enrichment Optimization Algorithm (EOA)-Based and Docking-Based Virtual Screening.

Int J Mol Sci. 2021 Dec 21;23(1):43. doi: 10.3390/ijms23010043.

Towards an Enrichment Optimization Algorithm (EOA)-based Target Specific Docking Functions for Virtual Screening.

Mol Inform. 2022 Nov;41(11):e2200034. doi: 10.1002/minf.202200034. Epub 2022 Jul 26.

Application of GA-MLR for QSAR Modeling of the Arylthioindole Class of Tubulin Polymerization Inhibitors as Anticancer Agents.

Anticancer Agents Med Chem. 2017;17(4):552-565. doi: 10.2174/1871520616666160811162105.

QSAR studies of the bioactivity of hepatitis C virus (HCV) NS3/4A protease inhibitors by multiple linear regression (MLR) and support vector machine (SVM).

Bioorg Med Chem Lett. 2017 Jul 1;27(13):2931-2938. doi: 10.1016/j.bmcl.2017.05.001. Epub 2017 May 3.

Do we need different machine learning algorithms for QSAR modeling? A comprehensive assessment of 16 machine learning algorithms on 14 QSAR data sets.

Brief Bioinform. 2021 Jul 20;22(4). doi: 10.1093/bib/bbaa321.

Classification and QSAR models of leukotriene A4 hydrolase (LTA4H) inhibitors by machine learning methods.

SAR QSAR Environ Res. 2021 May;32(5):411-431. doi: 10.1080/1062936X.2021.1910862. Epub 2021 Apr 26.

Binary classification of a large collection of environmental chemicals from estrogen receptor assays by quantitative structure-activity relationship and machine learning methods.

J Chem Inf Model. 2013 Dec 23;53(12):3244-61. doi: 10.1021/ci400527b. Epub 2013 Dec 11.

Application of validated QSAR models of D1 dopaminergic antagonists for database mining.

J Med Chem. 2005 Nov 17;48(23):7322-32. doi: 10.1021/jm049116m.

[Quantitative structure-activity relationship model for prediction of cardiotoxicity of chemical components in traditional Chinese medicines].

Beijing Da Xue Xue Bao Yi Xue Ban. 2017 Jun 18;49(3):551-556.

引用本文的文献

One size does not fit all: revising traditional paradigms for assessing accuracy of QSAR models used for virtual screening.

J Cheminform. 2025 Jan 16;17(1):7. doi: 10.1186/s13321-025-00948-y.

Recent Advances in Machine-Learning-Based Chemoinformatics: A Comprehensive Review.

Int J Mol Sci. 2023 Jul 15;24(14):11488. doi: 10.3390/ijms241411488.

Towards an Enrichment Optimization Algorithm (EOA)-based Target Specific Docking Functions for Virtual Screening.

Mol Inform. 2022 Nov;41(11):e2200034. doi: 10.1002/minf.202200034. Epub 2022 Jul 26.

QSAR, ADMET In Silico Pharmacokinetics, Molecular Docking and Molecular Dynamics Studies of Novel Bicyclo (Aryl Methyl) Benzamides as Potent GlyT1 Inhibitors for the Treatment of Schizophrenia.

Pharmaceuticals (Basel). 2022 May 27;15(6):670. doi: 10.3390/ph15060670.

Virtual Combinatorial Chemistry and Pharmacological Screening: A Short Guide to Drug Design.

Int J Mol Sci. 2022 Jan 30;23(3):1620. doi: 10.3390/ijms23031620.

A Comparison between Enrichment Optimization Algorithm (EOA)-Based and Docking-Based Virtual Screening.

Int J Mol Sci. 2021 Dec 21;23(1):43. doi: 10.3390/ijms23010043.

var. for Treating Rheumatoid Arthritis-An Assessment Combining Machine Learning-Guided ADME Properties Prediction, Network Pharmacology, and Pharmacological Assessment.

Front Pharmacol. 2021 Oct 4;12:704040. doi: 10.3389/fphar.2021.704040. eCollection 2021.

本文引用的文献

Exploring Chemical Space with Machine Learning.

Chimia (Aarau). 2019 Dec 18;73(12):1018-1023. doi: 10.2533/chimia.2019.1018.

Exploring the GDB-13 chemical space using deep generative models.

J Cheminform. 2019 Mar 12;11(1):20. doi: 10.1186/s13321-019-0341-z.

The next level in chemical space navigation: going far beyond enumerable compound libraries.

Drug Discov Today. 2019 May;24(5):1148-1156. doi: 10.1016/j.drudis.2019.02.013. Epub 2019 Mar 7.

Ultra-large library docking for discovering new chemotypes.

Nature. 2019 Feb;566(7743):224-229. doi: 10.1038/s41586-019-0917-9. Epub 2019 Feb 6.

QSAR-Based Virtual Screening: Advances and Applications in Drug Discovery.

Front Pharmacol. 2018 Nov 13;9:1275. doi: 10.3389/fphar.2018.01275. eCollection 2018.

Conditional Toxicity Value (CTV) Predictor: An Approach for Generating Quantitative Risk Estimates for Chemicals.

Environ Health Perspect. 2018 May 29;126(5):057008. doi: 10.1289/EHP2998. eCollection 2018 May.

Computer-Aided Discovery and Characterization of Novel Ebola Virus Inhibitors.

J Med Chem. 2018 Apr 26;61(8):3582-3594. doi: 10.1021/acs.jmedchem.8b00035. Epub 2018 Apr 17.

QSAR models of human data can enrich or replace LLNA testing for human skin sensitization.

Green Chem. 2016 Dec 21;18(24):6501-6515. doi: 10.1039/C6GC01836J. Epub 2016 Oct 6.

Prediction models for drug-induced hepatotoxicity by using weighted molecular fingerprints.

BMC Bioinformatics. 2017 May 31;18(Suppl 7):227. doi: 10.1186/s12859-017-1638-4.

Best Practices for QSAR Model Development, Validation, and Exploitation.

Mol Inform. 2010 Jul 12;29(6-7):476-88. doi: 10.1002/minf.201000061. Epub 2010 Jul 6.

文献AI研究员

20分钟写一篇综述，助力文献阅读效率提升50倍。

立即体验

用中文搜PubMed

大模型驱动的PubMed中文搜索引擎

马上搜索

文档翻译

学术文献翻译模型，支持多种主流文档格式。

立即体验

Suppr超能文献

QSAR 方程在虚拟筛选中的评估。

Evaluation of QSAR Equations for Virtual Screening.

机构信息

出版信息

相似文献

引用本文的文献

本文引用的文献

文献AI研究员

用中文搜PubMed

文档翻译