使用基于Wiki-pS0数据库训练的随机森林回归预测类药物分子的水相固有溶解度。

Prediction of aqueous intrinsic solubility of druglike molecules using Random Forest regression trained with Wiki-pS0 database.

作者信息

Avdeef Alex

机构信息

in-ADME Research, 1732 First Avenue #102, New York, NY 10128 USA.

出版信息

ADMET DMPK. 2020 Mar 4;8(1):29-77. doi: 10.5599/admet.766. eCollection 2020.

DOI:10.5599/admet.766

PMID:35299775

原文链接:https://pmc.ncbi.nlm.nih.gov/articles/PMC8915599/

Abstract

The accurate prediction of solubility of drugs is still problematic. It was thought for a long time that shortfalls had been due the lack of high-quality solubility data from the chemical space of drugs. This study considers the quality of solubility data, particularly of ionizable drugs. A database is described, comprising 6355 entries of intrinsic solubility for 3014 different molecules, drawing on 1325 citations. In an earlier publication, many factors affecting the quality of the measurement had been discussed, and suggestions were offered to improve ways of extracting more reliable information from legacy data. Many of the suggestions have been implemented in this study. By correcting solubility for ionization (i.e., deriving intrinsic solubility, S) and by normalizing temperature (by transforming measurements performed in the range 10-50 °C to 25 °C), it can now be estimated that the average interlaboratory reproducibility is 0.17 log unit. Empirical methods to predict solubility at best have hovered around the root mean square error (RMSE) of 0.6 log unit. Three prediction methods are compared here: (a) Yalkowsky's general solubility equation (GSE), (b) Abraham solvation equation (ABSOLV), and (c) Random Forest regression (RFR) statistical machine learning. The latter two methods were trained using the new database. The RFR method outperforms the other two models, as anticipated. However, the ability to predict the solubility of drugs to the level of the quality of data is still out of reach. The data quality is not the limiting factor in prediction. The statistical machine learning methodologies are probably up to the task. Possibly what's missing are solubility data from a few sparsely-covered chemical space of drugs (particularly of research compounds). Also, new descriptors which can better differentiate the factors affecting solubility between molecules could be critical for narrowing the gap between the accuracy of the prediction models and that of the experimental data.

摘要

药物溶解度的准确预测仍然存在问题。长期以来，人们认为不足之处在于缺乏来自药物化学空间的高质量溶解度数据。本研究考虑了溶解度数据的质量，特别是可电离药物的溶解度数据。描述了一个数据库，该数据库包含3014种不同分子的6355条固有溶解度条目，引用了1325篇文献。在早期的一篇出版物中，讨论了许多影响测量质量的因素，并提出了一些建议，以改进从旧数据中提取更可靠信息的方法。本研究实施了许多建议。通过校正电离溶解度（即得出固有溶解度S）并对温度进行归一化（将在10 - 50°C范围内进行的测量转换为25°C），现在可以估计实验室间的平均再现性为0.17对数单位。预测溶解度的经验方法的均方根误差（RMSE）充其量徘徊在0.6对数单位左右。这里比较了三种预测方法：（a）亚尔科夫斯基通用溶解度方程（GSE），（b）亚伯拉罕溶剂化方程（ABSOLV），以及（c）随机森林回归（RFR）统计机器学习。后两种方法使用新数据库进行训练。正如预期的那样，RFR方法优于其他两种模型。然而，将药物溶解度预测到数据质量水平的能力仍然无法实现。数据质量不是预测的限制因素。统计机器学习方法可能胜任这项任务。可能缺少的是来自一些药物化学空间覆盖较少（特别是研究化合物）的溶解度数据。此外，能够更好地区分分子间影响溶解度因素的新描述符对于缩小预测模型的准确性与实验数据准确性之间的差距可能至关重要。

相似文献

Prediction of aqueous intrinsic solubility of druglike molecules using Random Forest regression trained with Wiki-pS0 database.使用基于Wiki-pS0数据库训练的随机森林回归预测类药物分子的水相固有溶解度。

ADMET DMPK. 2020 Mar 4;8(1):29-77. doi: 10.5599/admet.766. eCollection 2020.

Predicting Solubility of Newly-Approved Drugs (2016-2020) with a Simple ABSOLV and GSE() Consensus Model Outperforming Random Forest Regression.使用简单的ABSOLV和GSE()共识模型预测新批准药物（2016 - 2020年）的溶解度，该模型优于随机森林回归。

J Solution Chem. 2022;51(9):1020-1055. doi: 10.1007/s10953-022-01141-7. Epub 2022 Feb 7.

Mechanistically transparent models for predicting aqueous solubility of rigid, slightly flexible, and very flexible drugs (MW<2000) Accuracy near that of random forest regression.用于预测刚性、轻度柔性和高度柔性药物（分子量<2000）水溶性的机理透明模型。准确性接近随机森林回归。

ADMET DMPK. 2023 Aug 21;11(3):317-330. doi: 10.5599/admet.1879. eCollection 2023.

"Flexible-Acceptor" General Solubility Equation for beyond Rule of 5 Drugs.超越“五规则”药物的“柔性受体”通用溶解方程。

Mol Pharm. 2020 Oct 5;17(10):3930-3940. doi: 10.1021/acs.molpharmaceut.0c00689. Epub 2020 Sep 4.

Can small drugs predict the intrinsic aqueous solubility of 'beyond Rule of 5' big drugs?小分子药物能否预测“超越5规则”的大分子药物的固有水溶性？

ADMET DMPK. 2020 Apr 25;8(3):180-206. doi: 10.5599/admet.794. eCollection 2020.

Multi-lab intrinsic solubility measurement reproducibility in CheqSol and shake-flask methods.CheqSol法和摇瓶法中多实验室固有溶解度测量的重现性

ADMET DMPK. 2019 Jun 5;7(3):210-219. doi: 10.5599/admet.698. eCollection 2019.

Revisiting the general solubility equation: in silico prediction of aqueous solubility incorporating the effect of topographical polar surface area.重新审视通用溶解度方程：纳入地形极性表面积影响的水溶度的计算预测。

J Chem Inf Model. 2012 Feb 27;52(2):420-8. doi: 10.1021/ci200387c. Epub 2012 Jan 13.

In silico Prediction of Aqueous Solubility: a Comparative Study of Local and Global Predictive Models.水溶解度的计算机模拟预测：局部和全局预测模型的比较研究

Mol Inform. 2015 Jun;34(6-7):417-30. doi: 10.1002/minf.201400144. Epub 2015 Jun 18.

Uniting cheminformatics and chemical theory to predict the intrinsic aqueous solubility of crystalline druglike molecules.结合化学信息学与化学理论预测类药物结晶分子的固有水溶性。

J Chem Inf Model. 2014 Mar 24;54(3):844-56. doi: 10.1021/ci4005805. Epub 2014 Mar 11.

Comparative Analysis of Chemical Descriptors by Machine Learning Reveals Atomistic Insights into Solute-Lipid Interactions.基于机器学习的化学描述符对比分析揭示了溶质-脂质相互作用的原子水平见解。

Mol Pharm. 2024 Jul 1;21(7):3343-3355. doi: 10.1021/acs.molpharmaceut.4c00080. Epub 2024 May 23.

引用本文的文献

Advancing Aqueous Solubility Prediction: A Machine Learning Approach for Organic Compounds Using a Curated Data Set.推进水溶性预测：一种使用精选数据集对有机化合物进行机器学习的方法。

J Chem Inf Model. 2025 Aug 25;65(16):8426-8434. doi: 10.1021/acs.jcim.4c02399. Epub 2025 Aug 10.

Physics-Based Solubility Prediction for Organic Molecules.基于物理的有机分子溶解度预测

Chem Rev. 2025 Aug 13;125(15):7057-7098. doi: 10.1021/acs.chemrev.4c00855. Epub 2025 Jul 29.

Application of high-precision solubility prediction models in the assisted design of drug-like compounds.高精度溶解度预测模型在类药物化合物辅助设计中的应用。

Mol Divers. 2025 May 27. doi: 10.1007/s11030-025-11239-x.

Machine Learning-Based Prediction of Drug Solubility in Lipidic Environments: The Sol_ME Tool for Optimizing Lipid-Based Formulations with a Preliminary Apalutamide Case Study.基于机器学习预测脂质环境中的药物溶解度：用于优化脂质体制剂的Sol_ME工具及阿帕鲁胺初步案例研究

AAPS PharmSciTech. 2025 Feb 3;26(2):50. doi: 10.1208/s12249-025-03051-5.

Establishing a Pharmacoinformatics Repository of Approved Medicines: A Database to Support Drug Product Development.建立已批准药物的药物信息学知识库：一个支持药品开发的数据库。

Mol Pharm. 2025 Jan 6;22(1):408-423. doi: 10.1021/acs.molpharmaceut.4c00991. Epub 2024 Dec 20.

Identification of AChE targeted therapeutic compounds for Alzheimer's disease: an in-silico study with DFT integration.用于阿尔茨海默病的乙酰胆碱酯酶靶向治疗化合物的鉴定：一项结合密度泛函理论的计算机模拟研究

Sci Rep. 2024 Dec 5;14(1):30356. doi: 10.1038/s41598-024-81285-2.

Effect of Data Quality and Data Quantity on the Estimation of Intrinsic Solubility: Analysis Based on a Single-Source Data Set.数据质量和数量对固有溶解度估算的影响：基于单数据源数据集的分析。

Mol Pharm. 2024 Oct 7;21(10):5261-5271. doi: 10.1021/acs.molpharmaceut.4c00685. Epub 2024 Sep 13.

Will we ever be able to accurately predict solubility?我们是否能够准确地预测溶解度？

Sci Data. 2024 Mar 18;11(1):303. doi: 10.1038/s41597-024-03105-6.

Designing solvent systems using self-evolving solubility databases and graph neural networks.利用自进化溶解度数据库和图神经网络设计溶剂系统。

Chem Sci. 2023 Dec 8;15(3):923-939. doi: 10.1039/d3sc03468b. eCollection 2024 Jan 17.

ADMET DMPK. 2023 Aug 21;11(3):317-330. doi: 10.5599/admet.1879. eCollection 2023.

本文引用的文献

Perspectives in solubility measurement and interpretation.溶解度测量与解读的视角

ADMET DMPK. 2019 Apr 5;7(2):88-105. doi: 10.5599/admet.686. eCollection 2019.

Solubility Challenge Revisited after Ten Years, with Multilab Shake-Flask Data, Using Tight (SD ∼ 0.17 log) and Loose (SD ∼ 0.62 log) Test Sets.十年后重新审视溶解度挑战，采用紧密（SD ∼ 0.17 log）和宽松（SD ∼ 0.62 log）测试集的多实验室摇瓶数据。

J Chem Inf Model. 2019 Jun 24;59(6):3036-3040. doi: 10.1021/acs.jcim.9b00345. Epub 2019 May 9.

Solubility-pH profile of desipramine hydrochloride in saline phosphate buffer: Enhanced solubility due to drug-buffer aggregates.盐酸去甲丙咪嗪在生理盐水磷酸盐缓冲液中的溶解度-pH 曲线：由于药物-缓冲剂聚集体而增加的溶解度。

Eur J Pharm Sci. 2019 May 15;133:264-274. doi: 10.1016/j.ejps.2019.03.014. Epub 2019 Mar 23.

Synthesis and Characterization of a Biomimetic Formulation of Clofazimine Hydrochloride Microcrystals for Parenteral Administration.用于肠胃外给药的盐酸氯法齐明微晶仿生制剂的合成与表征

Pharmaceutics. 2018 Nov 17;10(4):238. doi: 10.3390/pharmaceutics10040238.

Investigating the effects of amphipathic gastrointestinal compounds on the solution behaviour of salt and free base forms of clofazimine: An in vitro evaluation.研究两亲性胃肠化合物对氯法齐明盐和游离碱形式在溶液中行为的影响：体外评价。

Int J Pharm. 2018 Dec 1;552(1-2):180-192. doi: 10.1016/j.ijpharm.2018.09.012. Epub 2018 Sep 17.

Human intestinal fluid factors affecting intestinal drug permeation in vitro.影响肠道药物体外渗透的人体肠液因素。

Eur J Pharm Sci. 2018 Aug 30;121:338-346. doi: 10.1016/j.ejps.2018.06.007. Epub 2018 Jun 15.

Solubility determination of raloxifene hydrochloride in ten pure solvents at various temperatures: Thermodynamics-based analysis and solute-solvent interactions.盐酸雷洛昔芬在十种纯溶剂中不同温度下的溶解度测定：基于热力学的分析和溶质-溶剂相互作用。

Int J Pharm. 2018 Jun 10;544(1):165-171. doi: 10.1016/j.ijpharm.2018.04.024. Epub 2018 Apr 18.

Effect of vinylpyrrolidone polymers on the solubility and supersaturation of drugs; a study using the Cheqsol method.乙烯基吡咯烷酮聚合物对药物溶解度和过饱和度的影响；使用 Cheqsol 方法进行的研究。

Eur J Pharm Sci. 2018 May 30;117:227-235. doi: 10.1016/j.ejps.2018.02.025. Epub 2018 Feb 23.

Reverse Engineering the Intracellular Self-Assembly of a Functional Mechanopharmaceutical Device.逆向工程功能性机械药物装置的细胞内自组装过程

Sci Rep. 2018 Feb 13;8(1):2934. doi: 10.1038/s41598-018-21271-7.

Solubility Determination of Active Pharmaceutical Ingredients Which Have Been Recently Added to the List of Essential Medicines in the Context of the Biopharmaceutics Classification System-Biowaiver.在生物药剂学分类系统-生物豁免的背景下，对最近被列入基本药物清单的活性药物成分的溶解度测定。

J Pharm Sci. 2018 Jun;107(6):1478-1488. doi: 10.1016/j.xphs.2018.01.025. Epub 2018 Feb 6.

文献检索

告别复杂PubMed语法，用中文像聊天一样搜索，搜遍4000万医学文献。AI智能推荐，让科研检索更轻松。

立即免费搜索

文件翻译

保留排版，准确专业，支持PDF/Word/PPT等文件格式，支持 12+语言互译。

免费翻译文档

深度研究

AI帮你快速写综述，25分钟生成高质量综述，智能提取关键信息，辅助科研写作。

立即免费体验

使用基于Wiki-pS0数据库训练的随机森林回归预测类药物分子的水相固有溶解度。

Prediction of aqueous intrinsic solubility of druglike molecules using Random Forest regression trained with Wiki-pS0 database.

作者信息

机构信息

出版信息

相似文献

引用本文的文献

本文引用的文献

文献检索

文件翻译

深度研究

Suppr 超能文献

相似文献

引用本文的文献

本文引用的文献