Baba Hiromi, Takahara Jun-ichi, Mamitsuka Hiroshi
Kyoto R&D Center, Maruho Co., Ltd., Shimogyo-ku, Kyoto, Japan,
Pharm Res. 2015 Jul;32(7):2360-71. doi: 10.1007/s11095-015-1629-y. Epub 2015 Jan 24.
Predicting human skin permeability of chemical compounds accurately and efficiently is useful for developing dermatological medicines and cosmetics. However, previous work have two problems; 1) quality of databases used, and 2) methods for prediction models. In this paper, we attempt to solve these two problems.
We first compile, by carefully screening from the literature, a novel dataset of chemical compounds with permeability coefficients, measured under consistent experimental conditions. We then apply machine learning techniques such as support vector regression (SVR) and random forest (RF) to our database to develop prediction models. Molecular descriptors are fully computationally obtained, and greedy stepwise selection is employed for descriptor selection. Prediction models are internally and externally validated.
We generated an original, new database on human skin permeability of 211 different compounds from aqueous donors. Nonlinear SVR achieved the best performance among linear SVR, nonlinear SVR, and RF. The determination coefficient, root mean square error, and mean absolute error of nonlinear SVR in external validation were 0.910, 0.342, and 0.282, respectively.
We provided one of the largest datasets with purely experimental log kp and developed reliable and accurate prediction models for screening active ingredients and seeking unsynthesized compounds of dermatological medicines and cosmetics.
准确、高效地预测化合物的人体皮肤渗透性,对于开发皮肤科药物和化妆品具有重要意义。然而,以往的研究存在两个问题:1)所用数据库的质量;2)预测模型的方法。在本文中,我们试图解决这两个问题。
我们首先通过仔细筛选文献,编制了一个新的化合物数据集,这些化合物的渗透系数是在一致的实验条件下测量的。然后,我们将支持向量回归(SVR)和随机森林(RF)等机器学习技术应用于我们的数据库,以开发预测模型。分子描述符通过完全计算获得,并采用贪婪逐步选择法进行描述符选择。对预测模型进行内部和外部验证。
我们生成了一个关于211种不同化合物从水性供体的人体皮肤渗透性的原始新数据库。在线性SVR、非线性SVR和RF中,非线性SVR表现最佳。外部验证中非线性SVR的决定系数、均方根误差和平均绝对误差分别为0.910、0.342和0.282。
我们提供了最大的纯实验log kp数据集之一,并开发了可靠、准确的预测模型,用于筛选皮肤科药物和化妆品的活性成分以及寻找未合成的化合物。