School of Pharmacy, University of Hertfordshire, College Lane, Hatfield AL10 9AB, United Kingdom.
J Chem Inf Model. 2012 Feb 27;52(2):420-8. doi: 10.1021/ci200387c. Epub 2012 Jan 13.
The General Solubility Equation (GSE) is a QSPR model based on the melting point and log P of a chemical substance. It is used to predict the aqueous solubility of nonionizable chemical compounds. However, its reliance on experimentally derived descriptors, particularly melting point, limits its applicability to virtual compounds. The studies presented show that the GSE is able to predict, to within 1 log unit, the experimental aqueous solubility (log S) for 81% of the compounds in a data set of 1265 diverse chemical structures (-8.48 < log S < 1.58). However, the predictive ability of the GSE is reduced to 75% when applied to a subset of the data (1160 compounds -6.00 < log S < 0.00), which discounts those compounds occupying the sparsely populated regions of data space. This highlights how sparsely populated extremities of data sets can significantly skew results for linear regression-based models. Replacing the melting point descriptor of the GSE with a descriptor which accounts for topographical polar surface area (TPSA) produces a model of comparable quality to the GSE (the solubility of 81% of compounds in the full data set predicted accurately). As such, we propose an alternative simple model for predicting aqueous solubility which replaces the melting point descriptor of the GSE with TPSA and hence can be applied to virtual compounds. In addition, incorporating TPSA into the GSE in addition to log P and melting point gives a three descriptor model that improves accurate prediction of aqueous solubility over the GSE by 5.1% for the full and 6.6% for the reduced data set, respectively.
通用溶解度方程(GSE)是一种基于化学物质的熔点和 log P 的 QSPR 模型。它用于预测非电离化合物的水溶解度。然而,它对实验衍生描述符的依赖,特别是熔点,限制了它在虚拟化合物中的应用。所提出的研究表明,GSE 能够在 1265 种不同化学结构的数据集(-8.48 < log S < 1.58)中,以 1 个对数单位的精度预测 81%化合物的实验水溶解度(log S)。然而,当应用于数据集中的一个子集(1160 种化合物-6.00 < log S < 0.00)时,GSE 的预测能力降低到 75%,这排除了那些占据数据空间稀疏区域的化合物。这突出表明数据集稀疏的极端如何显著影响基于线性回归的模型的结果。用考虑地形极性表面积(TPSA)的描述符替代 GSE 的熔点描述符,产生了与 GSE 质量相当的模型(能够准确预测全数据集 81%化合物的溶解度)。因此,我们提出了一种替代的简单模型,用于预测水溶解度,它用 TPSA 替代 GSE 的熔点描述符,因此可以应用于虚拟化合物。此外,在 GSE 中加入 TPSA 以及 log P 和熔点,可以使三描述符模型分别将全数据集和缩小数据集的水溶解度准确预测提高 5.1%和 6.6%。