Wang Junmei, Krudy George, Hou Tingjun, Zhang Wei, Holland George, Xu Xiaojie
Encysive Pharmaceuticals Inc., 7000 Fannin Street, Houston, Texas 77030, USA.
J Chem Inf Model. 2007 Jul-Aug;47(4):1395-404. doi: 10.1021/ci700096r. Epub 2007 Jun 15.
In this work, two reliable aqueous solubility models, ASMS (aqueous solubility based on molecular surface) and ASMS-LOGP (aqueous solubility based on molecular surface using ClogP as a descriptor), were constructed by using atom type classified solvent accessible surface areas and several molecular descriptors for a diverse data set of 1708 molecules. For ASMS (without using ClogP as a descriptor), the leave-one-out q(2) and root-mean-square error (RMSE) were 0.872 and 0.748 log unit, respectively. ASMS-LOGP was slightly better than ASMS (q(2) = 0.886, RMSE = 0.705). Both models were extensively validated by three cross-validation tests and encouraging predictability was achieved. High throughput aqueous solubility prediction was conducted for a number of data sets extracted from several widely used databases. We found that real drugs are about 20-fold more soluble than the so-called druglike molecules in the ZINC database, which have no violation of Lipinski's "Rule of 5" at all. Specifically, oral drugs are about 16-fold more soluble, while injection drugs are 50-60-fold more soluble. If the criterion of a molecule to be soluble is set to -5 log unit, about 85% of real drugs are predicted as soluble; in contrast only 50% of druglike molecules in ZINC are soluble. We concluded that the two models could be served as a rule in druglike analysis and an efficient filter in prioritizing compound libraries prior to high throughput screenings (HTS).
在本研究中,利用原子类型分类的溶剂可及表面积和几个分子描述符,为1708个分子的多样化数据集构建了两个可靠的水溶性模型,即基于分子表面的水溶性模型(ASMS)和使用ClogP作为描述符基于分子表面的水溶性模型(ASMS-LOGP)。对于ASMS(不使用ClogP作为描述符),留一法交叉验证的q(2)和均方根误差(RMSE)分别为0.872和0.748对数单位。ASMS-LOGP略优于ASMS(q(2)=0.886,RMSE=0.705)。两个模型均通过三种交叉验证测试进行了广泛验证,并取得了令人鼓舞的预测性。对从几个广泛使用的数据库中提取的多个数据集进行了高通量水溶性预测。我们发现,实际药物的溶解度比ZINC数据库中所谓的类药物分子高约20倍,而这些类药物分子根本没有违反Lipinski的“五规则”。具体而言,口服药物的溶解度高约16倍,而注射用药物的溶解度高50-60倍。如果将分子可溶的标准设定为-5对数单位,则约85%的实际药物被预测为可溶;相比之下,ZINC数据库中只有50%的类药物分子是可溶的。我们得出结论,这两个模型可作为类药物分析的规则以及在高通量筛选(HTS)之前对化合物库进行优先级排序的有效过滤器。