Raevsky Oleg A, Polianczyk Daniel E, Grigorev Veniamin Yu, Raevskaja Olga E, Dearden John C
Department of Computer-Aided Molecular Design, Institute of Physiologically Active Compounds, Russian Academy of Science, 142432, Russia, Chernogolovka, Severniy proezd 1 phone: +7 496 52 21867.
School of Pharmacy and Biomolecular Sciences, Liverpool John Moores University, Liverpool L3 3AF, UK.
Mol Inform. 2015 Jun;34(6-7):417-30. doi: 10.1002/minf.201400144. Epub 2015 Jun 18.
32 Quantitative Structure-Property Relationship (QSPR) models were constructed for prediction of aqueous intrinsic solubility of liquid and crystalline chemicals. Data sets contained 1022 liquid and 2615 crystalline compounds. Multiple Linear Regression (MLR), Support Vector Machine (SVM) and Random Forest (RF) methods were used to construct global models, and k-nearest neighbour (kNN), Arithmetic Mean Property (AMP) and Local Regression Property (LoReP) were used to construct local models. A set of the best QSPR models was obtained: for liquid chemicals with RMSE (root mean square error) of prediction in the range 0.50-0.60 log unit; for crystalline chemicals 0.80-0.90 log unit. In the case of global models the large number of descriptors makes mechanistic interpretation difficult. The local models use only one or two descriptors, so that a medicinal chemist working with sets of structurally-related chemicals can readily estimate their solubility. However, construction of stable local models requires the presence of closely related neighbours for each chemical considered. It is probable that a consensus of global and local QSPR models will be the optimal approach for construction of stable predictive QSPR models with mechanistic interpretation.
构建了32个定量结构-性质关系(QSPR)模型,用于预测液体和晶体化学品的水相固有溶解度。数据集包含1022种液体化合物和2615种晶体化合物。采用多元线性回归(MLR)、支持向量机(SVM)和随机森林(RF)方法构建全局模型,采用k近邻(kNN)、算术平均性质(AMP)和局部回归性质(LoReP)构建局部模型。获得了一组最佳的QSPR模型:对于液体化学品,预测的均方根误差(RMSE)在0.50-0.60对数单位范围内;对于晶体化学品,RMSE在0.80-0.90对数单位范围内。在全局模型中,大量的描述符使得机理解释变得困难。局部模型仅使用一两个描述符,因此,处理结构相关化学品集的药物化学家可以很容易地估计它们的溶解度。然而,构建稳定的局部模型需要为每个考虑的化学品存在密切相关的邻居。全局和局部QSPR模型的共识可能是构建具有机理解释的稳定预测QSPR模型的最佳方法。