Suppr超能文献

影响相关性投票:一种准确且可解释的虚拟高通量筛选方法。

Influence relevance voting: an accurate and interpretable virtual high throughput screening method.

作者信息

Swamidass S Joshua, Azencott Chloé-Agathe, Lin Ting-Wan, Gramajo Hugo, Tsai Shiou-Chuan, Baldi Pierre

机构信息

School of Information and Computer Sciences, Institute for Genomics and Bioinformatics, University of California, Irvine, Irvine, California 92697-3435, USA.

出版信息

J Chem Inf Model. 2009 Apr;49(4):756-66. doi: 10.1021/ci8004379.

Abstract

Given activity training data from high-throughput screening (HTS) experiments, virtual high-throughput screening (vHTS) methods aim to predict in silico the activity of untested chemicals. We present a novel method, the Influence Relevance Voter (IRV), specifically tailored for the vHTS task. The IRV is a low-parameter neural network which refines a k-nearest neighbor classifier by nonlinearly combining the influences of a chemical's neighbors in the training set. Influences are decomposed, also nonlinearly, into a relevance component and a vote component. The IRV is benchmarked using the data and rules of two large, open, competitions, and its performance compared to the performance of other participating methods, as well as of an in-house support vector machine (SVM) method. On these benchmark data sets, IRV achieves state-of-the-art results, comparable to the SVM in one case, and significantly better than the SVM in the other, retrieving three times as many actives in the top 1% of its prediction-sorted list. The IRV presents several other important advantages over SVMs and other methods: (1) the output predictions have a probabilistic semantic; (2) the underlying inferences are interpretable; (3) the training time is very short, on the order of minutes even for very large data sets; (4) the risk of overfitting is minimal, due to the small number of free parameters; and (5) additional information can easily be incorporated into the IRV architecture. Combined with its performance, these qualities make the IRV particularly well suited for vHTS.

摘要

给定来自高通量筛选(HTS)实验的活性训练数据,虚拟高通量筛选(vHTS)方法旨在通过计算机模拟预测未测试化学物质的活性。我们提出了一种新颖的方法——影响相关性投票器(IRV),它是专门为vHTS任务量身定制的。IRV是一种低参数神经网络,它通过非线性组合训练集中化学物质邻居的影响来优化k近邻分类器。影响也被非线性地分解为相关性分量和投票分量。使用两个大型公开竞赛的数据和规则对IRV进行基准测试,并将其性能与其他参与方法以及内部支持向量机(SVM)方法的性能进行比较。在这些基准数据集上,IRV取得了领先的结果,在一种情况下与SVM相当,在另一种情况下明显优于SVM,在其预测排序列表的前1%中检索到的活性物质数量是SVM的三倍。与SVM和其他方法相比,IRV还有其他几个重要优势:(1)输出预测具有概率语义;(2)潜在推理是可解释的;(3)训练时间非常短,即使对于非常大的数据集也只需几分钟;(4)由于自由参数数量少,过拟合风险最小;(5)可以轻松地将额外信息纳入IRV架构。结合其性能,这些特性使IRV特别适合vHTS。

相似文献

2
Large-scale learning of structure-activity relationships using a linear support vector machine and problem-specific metrics.
J Chem Inf Model. 2011 Feb 28;51(2):203-13. doi: 10.1021/ci100073w. Epub 2011 Jan 5.
3
Machine learning methods and docking for predicting human pregnane X receptor activation.
Chem Res Toxicol. 2008 Jul;21(7):1457-67. doi: 10.1021/tx800102e. Epub 2008 Jun 12.
4
Effect of training data size and noise level on support vector machines virtual screening of genotoxic compounds from large compound libraries.
J Comput Aided Mol Des. 2011 May;25(5):455-67. doi: 10.1007/s10822-011-9431-3. Epub 2011 May 10.
5
Machine learning in virtual screening.
Comb Chem High Throughput Screen. 2009 May;12(4):332-43. doi: 10.2174/138620709788167980.
6
Virtual high-throughput screening of molecular databases.
Curr Opin Drug Discov Devel. 2007 May;10(3):298-307.
9
Performance of machine learning methods for ligand-based virtual screening.
Comb Chem High Throughput Screen. 2009 May;12(4):358-68. doi: 10.2174/138620709788167962.
10
A comparative study on feature selection for a risk prediction model for colorectal cancer.
Comput Methods Programs Biomed. 2019 Aug;177:219-229. doi: 10.1016/j.cmpb.2019.06.001. Epub 2019 Jun 4.

引用本文的文献

1
Chemical toxicity prediction based on semi-supervised learning and graph convolutional neural network.
J Cheminform. 2021 Nov 27;13(1):93. doi: 10.1186/s13321-021-00570-8.
2
Big-Data Science in Porous Materials: Materials Genomics and Machine Learning.
Chem Rev. 2020 Aug 26;120(16):8066-8129. doi: 10.1021/acs.chemrev.0c00004. Epub 2020 Jun 10.
3
Analyzing Learned Molecular Representations for Property Prediction.
J Chem Inf Model. 2019 Aug 26;59(8):3370-3388. doi: 10.1021/acs.jcim.9b00237. Epub 2019 Aug 13.
5
Practical Model Selection for Prospective Virtual Screening.
J Chem Inf Model. 2019 Jan 28;59(1):282-293. doi: 10.1021/acs.jcim.8b00363. Epub 2018 Dec 18.
6
Implicit-descriptor ligand-based virtual screening by means of collaborative filtering.
J Cheminform. 2018 Nov 22;10(1):56. doi: 10.1186/s13321-018-0310-y.
7
MoleculeNet: a benchmark for molecular machine learning.
Chem Sci. 2017 Oct 31;9(2):513-530. doi: 10.1039/c7sc02664a. eCollection 2018 Jan 14.
8
Opportunities and obstacles for deep learning in biology and medicine.
J R Soc Interface. 2018 Apr;15(141). doi: 10.1098/rsif.2017.0387.
10
Molecular graph convolutions: moving beyond fingerprints.
J Comput Aided Mol Des. 2016 Aug;30(8):595-608. doi: 10.1007/s10822-016-9938-8. Epub 2016 Aug 24.

本文引用的文献

1
Comparative study of machine-learning and chemometric tools for analysis of in-vivo high-throughput screening data.
J Chem Inf Model. 2008 Aug;48(8):1663-8. doi: 10.1021/ci800142d. Epub 2008 Aug 6.
3
A general regression neural network.
IEEE Trans Neural Netw. 1991;2(6):568-76. doi: 10.1109/72.97934.
4
Gradient descent learning algorithm overview: a general dynamical systems perspective.
IEEE Trans Neural Netw. 1995;6(1):182-95. doi: 10.1109/72.363438.
5
Managing bias in ROC curves.
J Comput Aided Mol Des. 2008 Mar-Apr;22(3-4):141-6. doi: 10.1007/s10822-008-9181-z. Epub 2008 Feb 7.
6
Moderating the outputs of support vector machine classifiers.
IEEE Trans Neural Netw. 1999;10(5):1018-31. doi: 10.1109/72.788642.
7
Lossless compression of chemical fingerprints using integer entropy codes improves storage and retrieval.
J Chem Inf Model. 2007 Nov-Dec;47(6):2098-109. doi: 10.1021/ci700200n. Epub 2007 Oct 30.
8
One- to four-dimensional kernels for virtual screening and the prediction of physical, chemical, and biological properties.
J Chem Inf Model. 2007 May-Jun;47(3):965-74. doi: 10.1021/ci600397p. Epub 2007 Mar 6.
9
Bounds and algorithms for fast exact searches of chemical fingerprints in linear and sublinear time.
J Chem Inf Model. 2007 Mar-Apr;47(2):302-17. doi: 10.1021/ci600358f. Epub 2007 Feb 28.
10
Evaluating virtual screening methods: good and bad metrics for the "early recognition" problem.
J Chem Inf Model. 2007 Mar-Apr;47(2):488-508. doi: 10.1021/ci600426e. Epub 2007 Feb 9.

文献AI研究员

20分钟写一篇综述,助力文献阅读效率提升50倍。

立即体验

用中文搜PubMed

大模型驱动的PubMed中文搜索引擎

马上搜索

文档翻译

学术文献翻译模型,支持多种主流文档格式。

立即体验