Suppr
超能文献

用于虚拟筛选以及物理、化学和生物学性质预测的一维至四维核。

One- to four-dimensional kernels for virtual screening and the prediction of physical, chemical, and biological properties.

作者信息

Azencott Chloé-Agathe, Ksikes Alexandre, Swamidass S Joshua, Chen Jonathan H, Ralaivola Liva, Baldi Pierre

机构信息

School of Information and Computer Sciences, University of California-Irvine, Irvine, California 92697-3435, USA.

出版信息

J Chem Inf Model. 2007 May-Jun;47(3):965-74. doi: 10.1021/ci600397p. Epub 2007 Mar 6.

DOI:10.1021/ci600397p

PMID:17338509

Abstract

Many chemoinformatics applications, including high-throughput virtual screening, benefit from being able to rapidly predict the physical, chemical, and biological properties of small molecules to screen large repositories and identify suitable candidates. When training sets are available, machine learning methods provide an effective alternative to ab initio methods for these predictions. Here, we leverage rich molecular representations including 1D SMILES strings, 2D graphs of bonds, and 3D coordinates to derive efficient machine learning kernels to address regression problems. We further expand the library of available spectral kernels for small molecules developed for classification problems to include 2.5D surface and 3D kernels using Delaunay tetrahedrization and other techniques from computational geometry, 3D pharmacophore kernels, and 3.5D or 4D kernels capable of taking into account multiple molecular configurations, such as conformers. The kernels are comprehensively tested using cross-validation and redundancy-reduction methods on regression problems using several available data sets to predict boiling points, melting points, aqueous solubility, octanol/water partition coefficients, and biological activity with state-of-the art results. When sufficient training data are available, 2D spectral kernels in general tend to yield the best and most robust results, better than state-of-the art. On data sets containing thousands of molecules, the kernels achieve a squared correlation coefficient of 0.91 for aqueous solubility prediction and 0.94 for octanol/water partition coefficient prediction. Averaging over conformations improves the performance of kernels based on the three-dimensional structure of molecules, especially on challenging data sets. Kernel predictors for aqueous solubility (kSOL), LogP (kLOGP), and melting point (kMELT) are available over the Web through: http://cdb.ics.uci.edu.

摘要

许多化学信息学应用，包括高通量虚拟筛选，都受益于能够快速预测小分子的物理、化学和生物学性质，以便筛选大型数据库并识别合适的候选物。当有训练集可用时，机器学习方法为这些预测提供了一种有效的替代从头计算方法的选择。在这里，我们利用丰富的分子表示，包括1D SMILES字符串、2D化学键图和3D坐标，来推导高效的机器学习核，以解决回归问题。我们进一步扩展了为分类问题开发的小分子可用光谱核库，包括使用德劳内四面体化和计算几何中的其他技术的2.5D表面核和3D核、3D药效团核以及能够考虑多种分子构型（如构象异构体）的3.5D或4D核。使用交叉验证和冗余减少方法，在几个可用数据集上对回归问题进行全面测试，以预测沸点、熔点、水溶性、辛醇/水分配系数和生物活性，得到了先进的结果。当有足够的训练数据时，一般来说，2D光谱核往往能产生最好、最稳健的结果，优于现有技术。在包含数千个分子的数据集上，这些核在水溶性预测方面的平方相关系数达到0.91，在辛醇/水分配系数预测方面达到0.94。对构象进行平均可提高基于分子三维结构的核的性能，特别是在具有挑战性的数据集上。水溶性（kSOL）、LogP（kLOGP）和熔点（kMELT）的核预测器可通过以下网址在网上获取：http://cdb.ics.uci.edu。

相似文献

One- to four-dimensional kernels for virtual screening and the prediction of physical, chemical, and biological properties.

J Chem Inf Model. 2007 May-Jun;47(3):965-74. doi: 10.1021/ci600397p. Epub 2007 Mar 6.

Deep architectures and deep learning in chemoinformatics: the prediction of aqueous solubility for drug-like molecules.

J Chem Inf Model. 2013 Jul 22;53(7):1563-75. doi: 10.1021/ci400187y. Epub 2013 Jul 2.

Virtual screening with support vector machines and structure kernels.

Comb Chem High Throughput Screen. 2009 May;12(4):409-23. doi: 10.2174/138620709788167926.

Kernels for small molecules and the prediction of mutagenicity, toxicity and anti-cancer activity.

Bioinformatics. 2005 Jun;21 Suppl 1:i359-68. doi: 10.1093/bioinformatics/bti1055.

Graph kernels for chemical informatics.

Neural Netw. 2005 Oct;18(8):1093-110. doi: 10.1016/j.neunet.2005.07.009. Epub 2005 Sep 12.

Predicting Melting Points of Organic Molecules: Applications to Aqueous Solubility Prediction Using the General Solubility Equation.

Mol Inform. 2015 Nov;34(11-12):715-24. doi: 10.1002/minf.201500052. Epub 2015 Jul 20.

Prediction of aqueous solubility from SCRATCH.

Int J Pharm. 2010 Jan 29;385(1-2):1-5. doi: 10.1016/j.ijpharm.2009.10.003. Epub 2009 Oct 9.

Why are some properties more difficult to predict than others? A study of QSPR models of solubility, melting point, and Log P.

J Chem Inf Model. 2008 Jan;48(1):220-32. doi: 10.1021/ci700307p. Epub 2008 Jan 11.

Comments on prediction of the aqueous solubility using the general solubility equation (GSE) versus a genetic algorithm and a support vector machine model.

Pharm Dev Technol. 2018 Sep;23(7):739-740. doi: 10.1080/10837450.2017.1321663. Epub 2017 May 8.

Scores of extended connectivity fingerprint as descriptors in QSPR study of melting point and aqueous solubility.

J Chem Inf Model. 2008 May;48(5):981-7. doi: 10.1021/ci800024c. Epub 2008 May 9.

引用本文的文献

Rapid Assessment of Virtually Synthesizable Chemical Structures via Support Vector Machine Models.

Mol Inform. 2025 Jul;44(7):e202500039. doi: 10.1002/minf.70000.

A multiple classifier system identifies novel cannabinoid CB2 receptor ligands.

J Cheminform. 2019 Nov 7;11(1):66. doi: 10.1186/s13321-019-0389-9.

Computational/in silico methods in drug target and lead prediction.

Brief Bioinform. 2020 Sep 25;21(5):1663-1675. doi: 10.1093/bib/bbz103.

Efficient multi-task chemogenomics for drug specificity prediction.

PLoS One. 2018 Oct 4;13(10):e0204999. doi: 10.1371/journal.pone.0204999. eCollection 2018.

Predicted Biological Activity of Purchasable Chemical Space.

J Chem Inf Model. 2018 Jan 22;58(1):148-164. doi: 10.1021/acs.jcim.7b00316. Epub 2017 Dec 29.

Deep architectures and deep learning in chemoinformatics: the prediction of aqueous solubility for drug-like molecules.

J Chem Inf Model. 2013 Jul 22;53(7):1563-75. doi: 10.1021/ci400187y. Epub 2013 Jul 2.

Learning to predict chemical reactions.

J Chem Inf Model. 2011 Sep 26;51(9):2209-22. doi: 10.1021/ci200207y. Epub 2011 Sep 2.

A CROC stronger than ROC: measuring, visualizing and optimizing early retrieval.

Bioinformatics. 2010 May 15;26(10):1348-56. doi: 10.1093/bioinformatics/btq140. Epub 2010 Apr 7.

Estimation of the applicability domain of kernel-based machine learning models for virtual screening.

J Cheminform. 2010 Mar 11;2(1):2. doi: 10.1186/1758-2946-2-2.

A constructive approach for discovering new drug leads: Using a kernel methodology for the inverse-QSAR problem.

J Cheminform. 2009 Apr 28;1:4. doi: 10.1186/1758-2946-1-4.

文献AI研究员

20分钟写一篇综述，助力文献阅读效率提升50倍。

立即体验

用中文搜PubMed

大模型驱动的PubMed中文搜索引擎

马上搜索

文档翻译

学术文献翻译模型，支持多种主流文档格式。

立即体验

Suppr超能文献

用于虚拟筛选以及物理、化学和生物学性质预测的一维至四维核。

One- to four-dimensional kernels for virtual screening and the prediction of physical, chemical, and biological properties.

作者信息

机构信息

出版信息

相似文献

引用本文的文献

文献AI研究员

用中文搜PubMed

文档翻译