使用支持向量机、递归划分和拉普拉斯修正朴素贝叶斯分类器，对噪声水平不断增加的高通量筛选数据进行富集。

Enrichment of high-throughput screening data with increasing levels of noise using support vector machines, recursive partitioning, and laplacian-modified naive bayesian classifiers.

作者信息

Glick Meir, Jenkins Jeremy L, Nettles James H, Hitchings Hamilton, Davies John W

机构信息

Lead Discovery Center, Novartis Institutes for Biomedical Research Inc., Cambridge, Massachusetts 02139, USA.

出版信息

J Chem Inf Model. 2006 Jan-Feb;46(1):193-200. doi: 10.1021/ci050374h.

DOI:10.1021/ci050374h

PMID:16426055

Abstract

High-throughput screening (HTS) plays a pivotal role in lead discovery for the pharmaceutical industry. In tandem, cheminformatics approaches are employed to increase the probability of the identification of novel biologically active compounds by mining the HTS data. HTS data is notoriously noisy, and therefore, the selection of the optimal data mining method is important for the success of such an analysis. Here, we describe a retrospective analysis of four HTS data sets using three mining approaches: Laplacian-modified naive Bayes, recursive partitioning, and support vector machine (SVM) classifiers with increasing stochastic noise in the form of false positives and false negatives. All three of the data mining methods at hand tolerated increasing levels of false positives even when the ratio of misclassified compounds to true active compounds was 5:1 in the training set. False negatives in the ratio of 1:1 were tolerated as well. SVM outperformed the other two methods in capturing active compounds and scaffolds in the top 1%. A Murcko scaffold analysis could explain the differences in enrichments among the four data sets. This study demonstrates that data mining methods can add a true value to the screen even when the data is contaminated with a high level of stochastic noise.

摘要

高通量筛选（HTS）在制药行业的先导化合物发现中起着关键作用。与此同时，化学信息学方法被用于通过挖掘高通量筛选数据来提高发现新型生物活性化合物的概率。高通量筛选数据的噪声很大，因此，选择最佳的数据挖掘方法对于此类分析的成功至关重要。在此，我们描述了使用三种挖掘方法对四个高通量筛选数据集进行的回顾性分析：拉普拉斯修正朴素贝叶斯、递归划分以及支持向量机（SVM）分类器，其中误报和漏报形式的随机噪声不断增加。即便训练集中误分类化合物与真正活性化合物的比例为5:1，现有的这三种数据挖掘方法都能容忍不断增加的误报水平。1:1比例的漏报也能被容忍。在捕获排名前1%的活性化合物和骨架方面，支持向量机的表现优于其他两种方法。默克分子骨架分析可以解释四个数据集之间富集程度的差异。这项研究表明，即使数据被高水平的随机噪声污染，数据挖掘方法也能为筛选增添真正的价值。

相似文献

Enrichment of high-throughput screening data with increasing levels of noise using support vector machines, recursive partitioning, and laplacian-modified naive bayesian classifiers.

J Chem Inf Model. 2006 Jan-Feb;46(1):193-200. doi: 10.1021/ci050374h.

Identifying actives from HTS data sets: practical approaches for the selection of an appropriate HTS data-processing method and quality control review.

J Biomol Screen. 2011 Jan;16(1):1-14. doi: 10.1177/1087057110389039. Epub 2010 Dec 15.

GPU accelerated support vector machines for mining high-throughput screening data.

J Chem Inf Model. 2009 Dec;49(12):2718-25. doi: 10.1021/ci900337f.

Comparative study of machine-learning and chemometric tools for analysis of in-vivo high-throughput screening data.

J Chem Inf Model. 2008 Aug;48(8):1663-8. doi: 10.1021/ci800142d. Epub 2008 Aug 6.

Enhanced HTS hit selection via a local hit rate analysis.

J Chem Inf Model. 2009 Oct;49(10):2202-10. doi: 10.1021/ci900113d.

Bayesian model averaging for ligand discovery.

J Chem Inf Model. 2009 Jun;49(6):1547-57. doi: 10.1021/ci900046u.

Mass spectrometric techniques for label-free high-throughput screening in drug discovery.

Anal Chem. 2007 Nov 1;79(21):8207-13. doi: 10.1021/ac062421q. Epub 2007 Sep 29.

Data mining PubChem using a support vector machine with the Signature molecular descriptor: classification of factor XIa inhibitors.

J Mol Graph Model. 2008 Nov;27(4):466-75. doi: 10.1016/j.jmgm.2008.08.004. Epub 2008 Aug 27.

Enrichment of extremely noisy high-throughput screening data using a naïve Bayes classifier.

J Biomol Screen. 2004 Feb;9(1):32-6. doi: 10.1177/1087057103260590.

Use of recursion forests in the sequential screening process: consensus selection by multiple recursion trees.

J Chem Inf Comput Sci. 2003 May-Jun;43(3):941-8. doi: 10.1021/ci034023j.

引用本文的文献

A two-stage dominance-based surrogate-assisted evolution algorithm for high-dimensional expensive multi-objective optimization.

Sci Rep. 2023 Aug 13;13(1):13163. doi: 10.1038/s41598-023-40019-6.

Machine Learning Models for Predicting Liver Toxicity.

Methods Mol Biol. 2022;2425:393-415. doi: 10.1007/978-1-0716-1960-5_15.

OptiPharm: An evolutionary algorithm to compare shape similarity.

Sci Rep. 2019 Feb 4;9(1):1398. doi: 10.1038/s41598-018-37908-6.

Linking High-Throughput Screens to Identify MoAs and Novel Inhibitors of Mycobacterium tuberculosis Dihydrofolate Reductase.

ACS Chem Biol. 2017 Sep 15;12(9):2448-2456. doi: 10.1021/acschembio.7b00468. Epub 2017 Aug 29.

Identification of novel MRP3 inhibitors based on computational models and validation using an in vitro membrane vesicle assay.

Eur J Pharm Sci. 2017 May 30;103:52-59. doi: 10.1016/j.ejps.2017.02.011. Epub 2017 Feb 24.

Predicting DPP-IV inhibitors with machine learning approaches.

J Comput Aided Mol Des. 2017 Apr;31(4):393-402. doi: 10.1007/s10822-017-0009-6. Epub 2017 Feb 2.

Data-driven approaches used for compound library design, hit triage and bioactivity modeling in high-throughput screening.

Brief Bioinform. 2018 Mar 1;19(2):277-285. doi: 10.1093/bib/bbw105.

PubChem structure-activity relationship (SAR) clusters.

J Cheminform. 2015 Jul 7;7:33. doi: 10.1186/s13321-015-0070-x. eCollection 2015.

LBVS: an online platform for ligand-based virtual screening using publicly accessible databases.

Mol Divers. 2014 Nov;18(4):829-40. doi: 10.1007/s11030-014-9545-3. Epub 2014 Sep 3.

Has discovery-based cancer research been a bust?

Clin Transl Oncol. 2013 Nov;15(11):865-70. doi: 10.1007/s12094-013-1071-8. Epub 2013 Sep 4.

文献AI研究员

20分钟写一篇综述，助力文献阅读效率提升50倍。

立即体验

用中文搜PubMed

大模型驱动的PubMed中文搜索引擎

马上搜索

文档翻译

学术文献翻译模型，支持多种主流文档格式。

立即体验

使用支持向量机、递归划分和拉普拉斯修正朴素贝叶斯分类器，对噪声水平不断增加的高通量筛选数据进行富集。

Enrichment of high-throughput screening data with increasing levels of noise using support vector machines, recursive partitioning, and laplacian-modified naive bayesian classifiers.

作者信息

机构信息

出版信息

相似文献

引用本文的文献

文献AI研究员

用中文搜PubMed

文档翻译

Suppr 超能文献

相似文献

引用本文的文献