Suppr超能文献

使用基于Wiki-pS0数据库训练的随机森林回归预测类药物分子的水相固有溶解度。

Prediction of aqueous intrinsic solubility of druglike molecules using Random Forest regression trained with Wiki-pS0 database.

作者信息

Avdeef Alex

机构信息

in-ADME Research, 1732 First Avenue #102, New York, NY 10128 USA.

出版信息

ADMET DMPK. 2020 Mar 4;8(1):29-77. doi: 10.5599/admet.766. eCollection 2020.

Abstract

The accurate prediction of solubility of drugs is still problematic. It was thought for a long time that shortfalls had been due the lack of high-quality solubility data from the chemical space of drugs. This study considers the quality of solubility data, particularly of ionizable drugs. A database is described, comprising 6355 entries of intrinsic solubility for 3014 different molecules, drawing on 1325 citations. In an earlier publication, many factors affecting the quality of the measurement had been discussed, and suggestions were offered to improve ways of extracting more reliable information from legacy data. Many of the suggestions have been implemented in this study. By correcting solubility for ionization (i.e., deriving intrinsic solubility, S) and by normalizing temperature (by transforming measurements performed in the range 10-50 °C to 25 °C), it can now be estimated that the average interlaboratory reproducibility is 0.17 log unit. Empirical methods to predict solubility at best have hovered around the root mean square error (RMSE) of 0.6 log unit. Three prediction methods are compared here: (a) Yalkowsky's general solubility equation (GSE), (b) Abraham solvation equation (ABSOLV), and (c) Random Forest regression (RFR) statistical machine learning. The latter two methods were trained using the new database. The RFR method outperforms the other two models, as anticipated. However, the ability to predict the solubility of drugs to the level of the quality of data is still out of reach. The data quality is not the limiting factor in prediction. The statistical machine learning methodologies are probably up to the task. Possibly what's missing are solubility data from a few sparsely-covered chemical space of drugs (particularly of research compounds). Also, new descriptors which can better differentiate the factors affecting solubility between molecules could be critical for narrowing the gap between the accuracy of the prediction models and that of the experimental data.

摘要

药物溶解度的准确预测仍然存在问题。长期以来,人们认为不足之处在于缺乏来自药物化学空间的高质量溶解度数据。本研究考虑了溶解度数据的质量,特别是可电离药物的溶解度数据。描述了一个数据库,该数据库包含3014种不同分子的6355条固有溶解度条目,引用了1325篇文献。在早期的一篇出版物中,讨论了许多影响测量质量的因素,并提出了一些建议,以改进从旧数据中提取更可靠信息的方法。本研究实施了许多建议。通过校正电离溶解度(即得出固有溶解度S)并对温度进行归一化(将在10 - 50°C范围内进行的测量转换为25°C),现在可以估计实验室间的平均再现性为0.17对数单位。预测溶解度的经验方法的均方根误差(RMSE)充其量徘徊在0.6对数单位左右。这里比较了三种预测方法:(a)亚尔科夫斯基通用溶解度方程(GSE),(b)亚伯拉罕溶剂化方程(ABSOLV),以及(c)随机森林回归(RFR)统计机器学习。后两种方法使用新数据库进行训练。正如预期的那样,RFR方法优于其他两种模型。然而,将药物溶解度预测到数据质量水平的能力仍然无法实现。数据质量不是预测的限制因素。统计机器学习方法可能胜任这项任务。可能缺少的是来自一些药物化学空间覆盖较少(特别是研究化合物)的溶解度数据。此外,能够更好地区分分子间影响溶解度因素的新描述符对于缩小预测模型的准确性与实验数据准确性之间的差距可能至关重要。

文献AI研究员

20分钟写一篇综述,助力文献阅读效率提升50倍。

立即体验

用中文搜PubMed

大模型驱动的PubMed中文搜索引擎

马上搜索

文档翻译

学术文献翻译模型,支持多种主流文档格式。

立即体验