• 文献检索
  • 文档翻译
  • 深度研究
  • 学术资讯
  • Suppr Zotero 插件Zotero 插件
  • 邀请有礼
  • 套餐&价格
  • 历史记录
应用&插件
Suppr Zotero 插件Zotero 插件浏览器插件Mac 客户端Windows 客户端微信小程序
定价
高级版会员购买积分包购买API积分包
服务
文献检索文档翻译深度研究API 文档MCP 服务
关于我们
关于 Suppr公司介绍联系我们用户协议隐私条款
关注我们

Suppr 超能文献

核心技术专利:CN118964589B侵权必究
粤ICP备2023148730 号-1Suppr @ 2026

文献检索

告别复杂PubMed语法,用中文像聊天一样搜索,搜遍4000万医学文献。AI智能推荐,让科研检索更轻松。

立即免费搜索

文件翻译

保留排版,准确专业,支持PDF/Word/PPT等文件格式,支持 12+语言互译。

免费翻译文档

深度研究

AI帮你快速写综述,25分钟生成高质量综述,智能提取关键信息,辅助科研写作。

立即免费体验

我们是否能够准确地预测溶解度?

Will we ever be able to accurately predict solubility?

机构信息

Laboratory of Chemoinformatics, UMR7140, University of Strasbourg, Strasbourg, France.

IDD/CADD, Sanofi, Vitry-Sur-Seine, France.

出版信息

Sci Data. 2024 Mar 18;11(1):303. doi: 10.1038/s41597-024-03105-6.

DOI:10.1038/s41597-024-03105-6
PMID:38499581
原文链接:https://pmc.ncbi.nlm.nih.gov/articles/PMC10948805/
Abstract

Accurate prediction of thermodynamic solubility by machine learning remains a challenge. Recent models often display good performances, but their reliability may be deceiving when used prospectively. This study investigates the origins of these discrepancies, following three directions: a historical perspective, an analysis of the aqueous solubility dataverse and data quality. We investigated over 20 years of published solubility datasets and models, highlighting overlooked datasets and the overlaps between popular sets. We benchmarked recently published models on a novel curated solubility dataset and report poor performances. We also propose a workflow to cure aqueous solubility data aiming at producing useful models for bench chemist. Our results demonstrate that some state-of-the-art models are not ready for public usage because they lack a well-defined applicability domain and overlook historical data sources. We report the impact of factors influencing the utility of the models: interlaboratory standard deviation, ionic state of the solute and data sources. The herein obtained models, and quality-assessed datasets are publicly available.

摘要

通过机器学习准确预测热力学溶解度仍然是一个挑战。最近的模型通常表现出良好的性能,但当前瞻性使用时,它们的可靠性可能会有误导性。本研究从三个方向调查了这些差异的起源:历史视角、水溶液溶解度数据和数据质量分析。我们调查了 20 多年来发表的溶解度数据集和模型,突出了被忽视的数据集和流行数据集之间的重叠。我们在一个新的经过精心整理的溶解度数据集上对最近发表的模型进行了基准测试,报告了较差的性能。我们还提出了一种水相溶解度数据的工作流程,旨在为实验室化学家生成有用的模型。我们的结果表明,由于缺乏明确定义的适用域并且忽略了历史数据源,一些最先进的模型还不能供公众使用。我们报告了影响模型实用性的因素的影响:实验室间标准偏差、溶质的离子状态和数据源。在此获得的模型和经过质量评估的数据集都是公开可用的。

https://cdn.ncbi.nlm.nih.gov/pmc/blobs/898f/10948805/fbe8a2ae1476/41597_2024_3105_Fig15_HTML.jpg
https://cdn.ncbi.nlm.nih.gov/pmc/blobs/898f/10948805/cfc3f4eb6ba3/41597_2024_3105_Fig1_HTML.jpg
https://cdn.ncbi.nlm.nih.gov/pmc/blobs/898f/10948805/bf09e7fe09c4/41597_2024_3105_Fig2_HTML.jpg
https://cdn.ncbi.nlm.nih.gov/pmc/blobs/898f/10948805/ead1e4f0d183/41597_2024_3105_Fig3_HTML.jpg
https://cdn.ncbi.nlm.nih.gov/pmc/blobs/898f/10948805/4b418764c957/41597_2024_3105_Fig4_HTML.jpg
https://cdn.ncbi.nlm.nih.gov/pmc/blobs/898f/10948805/58da6185dc0c/41597_2024_3105_Fig5_HTML.jpg
https://cdn.ncbi.nlm.nih.gov/pmc/blobs/898f/10948805/e2c3997e5bd2/41597_2024_3105_Fig6_HTML.jpg
https://cdn.ncbi.nlm.nih.gov/pmc/blobs/898f/10948805/0d29fb02da98/41597_2024_3105_Fig7_HTML.jpg
https://cdn.ncbi.nlm.nih.gov/pmc/blobs/898f/10948805/dd2629712073/41597_2024_3105_Fig8_HTML.jpg
https://cdn.ncbi.nlm.nih.gov/pmc/blobs/898f/10948805/000424448770/41597_2024_3105_Fig9_HTML.jpg
https://cdn.ncbi.nlm.nih.gov/pmc/blobs/898f/10948805/ea6d90ffaf71/41597_2024_3105_Fig10_HTML.jpg
https://cdn.ncbi.nlm.nih.gov/pmc/blobs/898f/10948805/4de6c22369b2/41597_2024_3105_Fig11_HTML.jpg
https://cdn.ncbi.nlm.nih.gov/pmc/blobs/898f/10948805/10c3afa27eea/41597_2024_3105_Fig12_HTML.jpg
https://cdn.ncbi.nlm.nih.gov/pmc/blobs/898f/10948805/8efa8331a5ad/41597_2024_3105_Fig13_HTML.jpg
https://cdn.ncbi.nlm.nih.gov/pmc/blobs/898f/10948805/718d4e1040fe/41597_2024_3105_Fig14_HTML.jpg
https://cdn.ncbi.nlm.nih.gov/pmc/blobs/898f/10948805/fbe8a2ae1476/41597_2024_3105_Fig15_HTML.jpg
https://cdn.ncbi.nlm.nih.gov/pmc/blobs/898f/10948805/cfc3f4eb6ba3/41597_2024_3105_Fig1_HTML.jpg
https://cdn.ncbi.nlm.nih.gov/pmc/blobs/898f/10948805/bf09e7fe09c4/41597_2024_3105_Fig2_HTML.jpg
https://cdn.ncbi.nlm.nih.gov/pmc/blobs/898f/10948805/ead1e4f0d183/41597_2024_3105_Fig3_HTML.jpg
https://cdn.ncbi.nlm.nih.gov/pmc/blobs/898f/10948805/4b418764c957/41597_2024_3105_Fig4_HTML.jpg
https://cdn.ncbi.nlm.nih.gov/pmc/blobs/898f/10948805/58da6185dc0c/41597_2024_3105_Fig5_HTML.jpg
https://cdn.ncbi.nlm.nih.gov/pmc/blobs/898f/10948805/e2c3997e5bd2/41597_2024_3105_Fig6_HTML.jpg
https://cdn.ncbi.nlm.nih.gov/pmc/blobs/898f/10948805/0d29fb02da98/41597_2024_3105_Fig7_HTML.jpg
https://cdn.ncbi.nlm.nih.gov/pmc/blobs/898f/10948805/dd2629712073/41597_2024_3105_Fig8_HTML.jpg
https://cdn.ncbi.nlm.nih.gov/pmc/blobs/898f/10948805/000424448770/41597_2024_3105_Fig9_HTML.jpg
https://cdn.ncbi.nlm.nih.gov/pmc/blobs/898f/10948805/ea6d90ffaf71/41597_2024_3105_Fig10_HTML.jpg
https://cdn.ncbi.nlm.nih.gov/pmc/blobs/898f/10948805/4de6c22369b2/41597_2024_3105_Fig11_HTML.jpg
https://cdn.ncbi.nlm.nih.gov/pmc/blobs/898f/10948805/10c3afa27eea/41597_2024_3105_Fig12_HTML.jpg
https://cdn.ncbi.nlm.nih.gov/pmc/blobs/898f/10948805/8efa8331a5ad/41597_2024_3105_Fig13_HTML.jpg
https://cdn.ncbi.nlm.nih.gov/pmc/blobs/898f/10948805/718d4e1040fe/41597_2024_3105_Fig14_HTML.jpg
https://cdn.ncbi.nlm.nih.gov/pmc/blobs/898f/10948805/fbe8a2ae1476/41597_2024_3105_Fig15_HTML.jpg

相似文献

1
Will we ever be able to accurately predict solubility?我们是否能够准确地预测溶解度?
Sci Data. 2024 Mar 18;11(1):303. doi: 10.1038/s41597-024-03105-6.
2
Pruned Machine Learning Models to Predict Aqueous Solubility.用于预测水溶性的剪枝机器学习模型
ACS Omega. 2020 Jul 1;5(27):16562-16567. doi: 10.1021/acsomega.0c01251. eCollection 2020 Jul 14.
3
Predicting absolute aqueous solubility by applying a machine learning model for an artificially liquid-state as proxy for the solid-state.通过应用机器学习模型预测人工液态(作为固态的替代物)的绝对水溶解度。
J Comput Aided Mol Des. 2023 Dec;37(12):765-789. doi: 10.1007/s10822-023-00538-w. Epub 2023 Oct 25.
4
ADME prediction with KNIME: A retrospective contribution to the second "Solubility Challenge".使用KNIME进行ADME预测:对第二届“溶解度挑战”的回顾性贡献。
ADMET DMPK. 2021 Jul 12;9(3):209-218. doi: 10.5599/admet.979. eCollection 2021.
5
Pushing the limits of solubility prediction via quality-oriented data selection.通过面向质量的数据选择拓展溶解度预测的极限。
iScience. 2020 Dec 17;24(1):101961. doi: 10.1016/j.isci.2020.101961. eCollection 2021 Jan 22.
6
Boosting the predictive performance with aqueous solubility dataset curation.通过对水溶解度数据集的整理来提高预测性能。
Sci Data. 2022 Mar 3;9(1):71. doi: 10.1038/s41597-022-01154-3.
7
Optimizing Pharmacokinetic Property Prediction Based on Integrated Datasets and a Deep Learning Approach.基于集成数据集和深度学习方法优化药代动力学性质预测。
J Chem Inf Model. 2020 Oct 26;60(10):4603-4613. doi: 10.1021/acs.jcim.0c00568. Epub 2020 Sep 1.
8
Findings of the Second Challenge to Predict Aqueous Solubility.第二次预测水溶解度挑战的结果。
J Chem Inf Model. 2020 Oct 26;60(10):4791-4803. doi: 10.1021/acs.jcim.0c00701. Epub 2020 Sep 3.
9
Blinded Predictions and Post Hoc Analysis of the Second Solubility Challenge Data: Exploring Training Data and Feature Set Selection for Machine and Deep Learning Models.盲法预测和事后分析第二次溶解度挑战数据:探索机器学习和深度学习模型的训练数据和特征集选择。
J Chem Inf Model. 2023 Feb 27;63(4):1099-1113. doi: 10.1021/acs.jcim.2c01189. Epub 2023 Feb 9.
10
Solubility challenge: can you predict solubilities of 32 molecules using a database of 100 reliable measurements?溶解度挑战:你能否利用一个包含100个可靠测量值的数据库预测32种分子的溶解度?
J Chem Inf Model. 2008 Jul;48(7):1289-303. doi: 10.1021/ci800058v. Epub 2008 Jul 15.

引用本文的文献

1
Data-driven organic solubility prediction at the limit of aleatoric uncertainty.在偶然不确定性极限下的数据驱动有机溶解度预测。
Nat Commun. 2025 Aug 19;16(1):7497. doi: 10.1038/s41467-025-62717-7.
2
Solvent Redistribution Method To Determine Solubility and Aggregation: High Throughput, Accuracy, and Sustainability.用于测定溶解度和聚集的溶剂再分配方法:高通量、准确性和可持续性。
J Phys Chem B. 2025 Aug 28;129(34):8798-8805. doi: 10.1021/acs.jpcb.5c03073. Epub 2025 Aug 19.
3
Advancing Aqueous Solubility Prediction: A Machine Learning Approach for Organic Compounds Using a Curated Data Set.

本文引用的文献

1
Transparency in Modeling through Careful Application of OECD's QSAR/QSPR Principles via a Curated Water Solubility Data Set.通过精心应用经合组织的 QSAR/QSPR 原则并通过精心制作的水溶性数据集实现建模透明度。
Chem Res Toxicol. 2023 Mar 20;36(3):465-478. doi: 10.1021/acs.chemrestox.2c00379. Epub 2023 Mar 6.
2
Evaluation of Deep Learning Architectures for Aqueous Solubility Prediction.用于水溶性预测的深度学习架构评估
ACS Omega. 2022 Apr 25;7(18):15695-15710. doi: 10.1021/acsomega.2c00642. eCollection 2022 May 10.
3
Novel Solubility Prediction Models: Molecular Fingerprints and Physicochemical Features vs Graph Convolutional Neural Networks.
推进水溶性预测:一种使用精选数据集对有机化合物进行机器学习的方法。
J Chem Inf Model. 2025 Aug 25;65(16):8426-8434. doi: 10.1021/acs.jcim.4c02399. Epub 2025 Aug 10.
4
Physics-Based Solubility Prediction for Organic Molecules.基于物理的有机分子溶解度预测
Chem Rev. 2025 Aug 13;125(15):7057-7098. doi: 10.1021/acs.chemrev.4c00855. Epub 2025 Jul 29.
5
Machine learning analysis of molecular dynamics properties influencing drug solubility.影响药物溶解度的分子动力学性质的机器学习分析
Sci Rep. 2025 Jul 24;15(1):26955. doi: 10.1038/s41598-025-11392-1.
6
BigSolDB 2.0, dataset of solubility values for organic compounds in different solvents at various temperatures.BigSolDB 2.0,不同温度下有机化合物在不同溶剂中的溶解度值数据集。
Sci Data. 2025 Jul 15;12(1):1236. doi: 10.1038/s41597-025-05559-8.
7
Benchmarking quantum chemical methods with X-ray structures via structure-specific restraints.通过特定结构约束利用X射线结构对量子化学方法进行基准测试。
IUCrJ. 2025 Jul 1;12(Pt 4):472-487. doi: 10.1107/S2052252525004543.
8
Prediction of the water solubility by a graph convolutional-based neural network on a highly curated dataset.基于图卷积神经网络在高度精选数据集上对水溶性进行预测。
J Cheminform. 2025 Apr 21;17(1):55. doi: 10.1186/s13321-025-01000-9.
9
Identification of Novel Human 15-Lipoxygenase-2 (h15-LOX-2) Inhibitors Using a Virtual Screening Approach.使用虚拟筛选方法鉴定新型人15-脂氧合酶-2(h15-LOX-2)抑制剂
J Med Chem. 2025 Jan 9;68(1):307-323. doi: 10.1021/acs.jmedchem.4c01884. Epub 2024 Dec 19.
10
Thermodynamic Assessment of the Pyrazinamide Dissolution Process in Some Organic Solvents.一些有机溶剂中吡嗪酰胺溶解过程的热力学评估。
Molecules. 2024 Oct 28;29(21):5089. doi: 10.3390/molecules29215089.
新型溶解度预测模型:分子指纹和物理化学特征与图卷积神经网络
ACS Omega. 2022 Apr 4;7(14):12268-12277. doi: 10.1021/acsomega.2c00697. eCollection 2022 Apr 12.
4
ADME prediction with KNIME: aqueous solubility consensus model based on supervised recursive random forest approaches.使用KNIME进行药物吸收、分布、代谢和排泄(ADME)预测:基于监督递归随机森林方法的水溶性共识模型。
ADMET DMPK. 2020 Aug 7;8(3):251-273. doi: 10.5599/admet.852. eCollection 2020.
5
Prediction of aqueous intrinsic solubility of druglike molecules using Random Forest regression trained with Wiki-pS0 database.使用基于Wiki-pS0数据库训练的随机森林回归预测类药物分子的水相固有溶解度。
ADMET DMPK. 2020 Mar 4;8(1):29-77. doi: 10.5599/admet.766. eCollection 2020.
6
Accurate Physical Property Predictions via Deep Learning.通过深度学习进行准确的物理性质预测。
Molecules. 2022 Mar 3;27(5):1668. doi: 10.3390/molecules27051668.
7
Boosting the predictive performance with aqueous solubility dataset curation.通过对水溶解度数据集的整理来提高预测性能。
Sci Data. 2022 Mar 3;9(1):71. doi: 10.1038/s41597-022-01154-3.
8
Improved Lipophilicity and Aqueous Solubility Prediction with Composite Graph Neural Networks.复合图神经网络提高亲脂性和水溶解度预测。
Molecules. 2021 Oct 13;26(20):6185. doi: 10.3390/molecules26206185.
9
Application of Artificial Neural Networks to Predict the Intrinsic Solubility of Drug-Like Molecules.人工神经网络在预测类药物分子固有溶解度中的应用。
Pharmaceutics. 2021 Jul 20;13(7):1101. doi: 10.3390/pharmaceutics13071101.
10
SolTranNet-A Machine Learning Tool for Fast Aqueous Solubility Prediction.SolTranNet:一种用于快速预测水溶解度的机器学习工具。
J Chem Inf Model. 2021 Jun 28;61(6):2530-2536. doi: 10.1021/acs.jcim.1c00331. Epub 2021 May 26.