基于大型数据集，利用多种采用拓扑结构表示法的定量构效关系（QSPR）模型预测水溶解度。

Prediction of aqueous solubility based on large datasets using several QSPR models utilizing topological structure representation.

作者信息

Votano Joseph R, Parham Marc, Hall Lowell H, Kier Lemont B, Hall L Mark

机构信息

ChemSilico LLC, 48 Baldwin Street, Tewksbury, MA 01876, USA.

出版信息

Chem Biodivers. 2004 Nov;1(11):1829-41. doi: 10.1002/cbdv.200490137.

DOI:10.1002/cbdv.200490137

PMID:17191819

Abstract

Several QSPR models were developed for predicting intrinsic aqueous solubility, S(o). A data set of 5,964 neutral compounds was sub-divided into two classes, aromatic and non-aromatic compounds. Three models were created with different methods on both data sets: two regression models (multiple linear regression and partial least squares) and an artificial neural network model. These models were based on 3343 aromatic and 1674 non-aromatic compounds for training sets; 938 compounds were used in external validation testing. The range in -log S(o) is -1.6 to 10. Topological structure descriptors were used with all models. A genetic algorithm was used for descriptor selection for regression models. For the artificial neural network (ANN) model, descriptor selection was done with a backward elimination process. All models performed well with r2 values ranging 0.72 to 0.84 in external validation testing. The mean absolute errors in validation ranged from 0.44 to 0.80 for the classes of compounds for all the models. These statistical results indicate a sound ANN model. Furthermore, in a comparison with eight other available models, based on predictions using a validation test set (442 compounds), the artificial neural network model presented in this work (CSLogWS) was clearly superior based on both the mean absolute error and the percentage of residuals less than one log unit. In the ANN model both E-State and hydrogen E-State descriptors were found to be important.

摘要

开发了几种定量构效关系（QSPR）模型来预测固有水溶解度S(o)。一个包含5964种中性化合物的数据集被细分为两类，即芳香族化合物和非芳香族化合物。在这两个数据集上使用不同方法创建了三个模型：两个回归模型（多元线性回归和偏最小二乘法）和一个人工神经网络模型。这些模型基于3343种芳香族化合物和1674种非芳香族化合物作为训练集；938种化合物用于外部验证测试。-log S(o)的范围是-1.6至10。所有模型均使用拓扑结构描述符。遗传算法用于回归模型的描述符选择。对于人工神经网络（ANN）模型，描述符选择通过反向消除过程进行。所有模型在外部验证测试中表现良好，r2值范围为0.72至0.84。所有模型中各类化合物验证的平均绝对误差范围为0.44至0.80。这些统计结果表明了一个可靠的人工神经网络模型。此外，与其他八个可用模型相比，基于使用验证测试集（442种化合物）的预测，本文提出的人工神经网络模型（CSLogWS）在平均绝对误差和残差小于一个对数单位的百分比方面均明显更优。在人工神经网络模型中，发现E态和氢E态描述符都很重要。

相似文献

Prediction of aqueous solubility based on large datasets using several QSPR models utilizing topological structure representation.

Chem Biodivers. 2004 Nov;1(11):1829-41. doi: 10.1002/cbdv.200490137.

Wavelet neural network modeling in QSPR for prediction of solubility of 25 anthraquinone dyes at different temperatures and pressures in supercritical carbon dioxide.

J Mol Graph Model. 2006 Sep;25(1):46-54. doi: 10.1016/j.jmgm.2005.10.012. Epub 2005 Dec 5.

QSAR modeling of human serum protein binding with several modeling techniques utilizing structure-information representation.

J Med Chem. 2006 Nov 30;49(24):7169-81. doi: 10.1021/jm051245v.

Prediction of the aqueous solubility of benzylamine salts using QSPR model.

J Pharm Biomed Anal. 2005 Feb 23;37(2):411-5. doi: 10.1016/j.jpba.2004.11.005.

Prediction of aqueous solubility and partition coefficient optimized by a genetic algorithm based descriptor selection method.

J Chem Inf Comput Sci. 2003 May-Jun;43(3):1077-84. doi: 10.1021/ci034006u.

Prediction of HPLC retention index using artificial neural networks and IGroup E-state indices.

J Chem Inf Model. 2009 Apr;49(4):788-99. doi: 10.1021/ci9000162.

Predictive QSAR modeling of HIV reverse transcriptase inhibitor TIBO derivatives.

Eur J Med Chem. 2009 Apr;44(4):1509-24. doi: 10.1016/j.ejmech.2008.07.020. Epub 2008 Jul 24.

New QSPR study for the prediction of aqueous solubility of drug-like compounds.

Bioorg Med Chem. 2008 Sep 1;16(17):7944-55. doi: 10.1016/j.bmc.2008.07.067. Epub 2008 Jul 29.

Linear and nonlinear quantitative structure-property relationship models for solubility of some anthraquinone, anthrone and xanthone derivatives in supercritical carbon dioxide.

Anal Chim Acta. 2008 Mar 3;610(1):25-34. doi: 10.1016/j.aca.2008.01.011. Epub 2008 Jan 15.

Prediction of impact sensitivity of nitro energetic compounds by neural network based on electrotopological-state indices.

J Hazard Mater. 2009 Jul 15;166(1):155-86. doi: 10.1016/j.jhazmat.2008.11.005. Epub 2008 Nov 13.

引用本文的文献

Chains of Commerce: A Comprehensive Review of Animal Welfare Impacts in the International Wildlife Trade.

Animals (Basel). 2025 Mar 27;15(7):971. doi: 10.3390/ani15070971.

Will we ever be able to accurately predict solubility?

Sci Data. 2024 Mar 18;11(1):303. doi: 10.1038/s41597-024-03105-6.

A multiplex metabolomic approach for quality control of Spirulina supplement and its allied microalgae (Amphora & Chlorella) assisted by chemometrics and molecular networking.

Sci Rep. 2024 Feb 2;14(1):2809. doi: 10.1038/s41598-024-53219-5.

Tailor-made solvents for pharmaceutical use? Experimental and computational approach for determining solubility in deep eutectic solvents (DES).

Int J Pharm X. 2019 Oct 31;1:100034. doi: 10.1016/j.ijpx.2019.100034. eCollection 2019 Dec.

Prediction of the partition coefficients using QSPR modeling and simulation of paclitaxel release from the diffusion-controlled drug delivery devices.

Drug Deliv Transl Res. 2018 Oct;8(5):1300-1312. doi: 10.1007/s13346-018-0530-8.

Binary classification of aqueous solubility using support vector machines with reduction and recombination feature selection.

J Chem Inf Model. 2011 Feb 28;51(2):229-36. doi: 10.1021/ci100364a. Epub 2011 Jan 7.

CE50: quantifying collision induced dissociation energy for small molecule characterization and identification.

J Am Soc Mass Spectrom. 2009 Sep;20(9):1759-67. doi: 10.1016/j.jasms.2009.06.002. Epub 2009 Jun 21.

Three-class classification models of logS and logP derived by using GA-CG-SVM approach.

Mol Divers. 2009 May;13(2):261-8. doi: 10.1007/s11030-009-9108-1. Epub 2009 Jan 31.

Antiplasmodial activity of [(aryl)arylsulfanylmethyl]Pyridine.

Antimicrob Agents Chemother. 2008 Feb;52(2):705-15. doi: 10.1128/AAC.00898-07. Epub 2007 Nov 19.

Chemogenomic approaches to rational drug design.

Br J Pharmacol. 2007 Sep;152(1):38-52. doi: 10.1038/sj.bjp.0707307. Epub 2007 May 29.

文献AI研究员

20分钟写一篇综述，助力文献阅读效率提升50倍。

立即体验

用中文搜PubMed

大模型驱动的PubMed中文搜索引擎

马上搜索

文档翻译

学术文献翻译模型，支持多种主流文档格式。

立即体验

基于大型数据集，利用多种采用拓扑结构表示法的定量构效关系（QSPR）模型预测水溶解度。

Prediction of aqueous solubility based on large datasets using several QSPR models utilizing topological structure representation.

作者信息

Votano Joseph R, Parham Marc, Hall Lowell H, Kier Lemont B, Hall L Mark

机构信息

ChemSilico LLC, 48 Baldwin Street, Tewksbury, MA 01876, USA.

出版信息

Chem Biodivers. 2004 Nov;1(11):1829-41. doi: 10.1002/cbdv.200490137.

DOI:10.1002/cbdv.200490137

PMID:17191819

Abstract

摘要

基于大型数据集，利用多种采用拓扑结构表示法的定量构效关系（QSPR）模型预测水溶解度。

Prediction of aqueous solubility based on large datasets using several QSPR models utilizing topological structure representation.

作者信息

机构信息

出版信息

相似文献

引用本文的文献

文献AI研究员

用中文搜PubMed

文档翻译

Suppr 超能文献

基于大型数据集，利用多种采用拓扑结构表示法的定量构效关系（QSPR）模型预测水溶解度。

Prediction of aqueous solubility based on large datasets using several QSPR models utilizing topological structure representation.

作者信息

机构信息

出版信息

相似文献

引用本文的文献