• 文献检索
  • 文档翻译
  • 深度研究
  • 学术资讯
  • Suppr Zotero 插件Zotero 插件
  • 邀请有礼
  • 套餐&价格
  • 历史记录
应用&插件
Suppr Zotero 插件Zotero 插件浏览器插件Mac 客户端Windows 客户端微信小程序
定价
高级版会员购买积分包购买API积分包
服务
文献检索文档翻译深度研究API 文档MCP 服务
关于我们
关于 Suppr公司介绍联系我们用户协议隐私条款
关注我们

Suppr 超能文献

核心技术专利:CN118964589B侵权必究
粤ICP备2023148730 号-1Suppr @ 2026

文献检索

告别复杂PubMed语法,用中文像聊天一样搜索,搜遍4000万医学文献。AI智能推荐,让科研检索更轻松。

立即免费搜索

文件翻译

保留排版,准确专业,支持PDF/Word/PPT等文件格式,支持 12+语言互译。

免费翻译文档

深度研究

AI帮你快速写综述,25分钟生成高质量综述,智能提取关键信息,辅助科研写作。

立即免费体验

基于机器学习算法预测小分子化合物在有机溶剂中的溶解度

Prediction of small-molecule compound solubility in organic solvents by machine learning algorithms.

作者信息

Ye Zhuyifan, Ouyang Defang

机构信息

State Key Laboratory of Quality Research in Chinese Medicine, Institute of Chinese Medical Sciences (ICMS), University of Macau, Macau, China.

出版信息

J Cheminform. 2021 Dec 11;13(1):98. doi: 10.1186/s13321-021-00575-3.

DOI:10.1186/s13321-021-00575-3
PMID:34895323
原文链接:https://pmc.ncbi.nlm.nih.gov/articles/PMC8665485/
Abstract

Rapid solvent selection is of great significance in chemistry. However, solubility prediction remains a crucial challenge. This study aimed to develop machine learning models that can accurately predict compound solubility in organic solvents. A dataset containing 5081 experimental temperature and solubility data of compounds in organic solvents was extracted and standardized. Molecular fingerprints were selected to characterize structural features. lightGBM was compared with deep learning and traditional machine learning (PLS, Ridge regression, kNN, DT, ET, RF, SVM) to develop models for predicting solubility in organic solvents at different temperatures. Compared to other models, lightGBM exhibited significantly better overall generalization (logS  ± 0.20). For unseen solutes, our model gave a prediction accuracy (logS  ± 0.59) close to the expected noise level of experimental solubility data. lightGBM revealed the physicochemical relationship between solubility and structural features. Our method enables rapid solvent screening in chemistry and may be applied to solubility prediction in other solvents.

摘要

快速选择溶剂在化学领域具有重要意义。然而,溶解度预测仍然是一项关键挑战。本研究旨在开发能够准确预测化合物在有机溶剂中溶解度的机器学习模型。提取并标准化了一个包含5081个化合物在有机溶剂中的实验温度和溶解度数据的数据集。选择分子指纹来表征结构特征。将lightGBM与深度学习和传统机器学习(PLS、岭回归、kNN、决策树、随机森林、随机森林、支持向量机)进行比较,以开发预测不同温度下有机溶剂中溶解度的模型。与其他模型相比,lightGBM表现出明显更好的整体泛化能力(logS±0.20)。对于未见过的溶质,我们的模型给出的预测准确率(logS±0.59)接近实验溶解度数据的预期噪声水平。lightGBM揭示了溶解度与结构特征之间的物理化学关系。我们的方法能够在化学领域实现快速溶剂筛选,并可能应用于其他溶剂的溶解度预测。

https://cdn.ncbi.nlm.nih.gov/pmc/blobs/494c/8665485/616f5be70910/13321_2021_575_Fig5_HTML.jpg
https://cdn.ncbi.nlm.nih.gov/pmc/blobs/494c/8665485/48af3707be70/13321_2021_575_Fig1_HTML.jpg
https://cdn.ncbi.nlm.nih.gov/pmc/blobs/494c/8665485/d2c9f75c511b/13321_2021_575_Fig2_HTML.jpg
https://cdn.ncbi.nlm.nih.gov/pmc/blobs/494c/8665485/5b9ee59c925f/13321_2021_575_Fig3_HTML.jpg
https://cdn.ncbi.nlm.nih.gov/pmc/blobs/494c/8665485/418cd38cfa61/13321_2021_575_Fig4_HTML.jpg
https://cdn.ncbi.nlm.nih.gov/pmc/blobs/494c/8665485/616f5be70910/13321_2021_575_Fig5_HTML.jpg
https://cdn.ncbi.nlm.nih.gov/pmc/blobs/494c/8665485/48af3707be70/13321_2021_575_Fig1_HTML.jpg
https://cdn.ncbi.nlm.nih.gov/pmc/blobs/494c/8665485/d2c9f75c511b/13321_2021_575_Fig2_HTML.jpg
https://cdn.ncbi.nlm.nih.gov/pmc/blobs/494c/8665485/5b9ee59c925f/13321_2021_575_Fig3_HTML.jpg
https://cdn.ncbi.nlm.nih.gov/pmc/blobs/494c/8665485/418cd38cfa61/13321_2021_575_Fig4_HTML.jpg
https://cdn.ncbi.nlm.nih.gov/pmc/blobs/494c/8665485/616f5be70910/13321_2021_575_Fig5_HTML.jpg

相似文献

1
Prediction of small-molecule compound solubility in organic solvents by machine learning algorithms.基于机器学习算法预测小分子化合物在有机溶剂中的溶解度
J Cheminform. 2021 Dec 11;13(1):98. doi: 10.1186/s13321-021-00575-3.
2
Machine learning with physicochemical relationships: solubility prediction in organic solvents and water.基于物理化学关系的机器学习:有机溶剂和水中的溶解度预测。
Nat Commun. 2020 Nov 13;11(1):5753. doi: 10.1038/s41467-020-19594-z.
3
Prediction of the Aqueous Solubility of Compounds Based on Light Gradient Boosting Machines with Molecular Fingerprints and the Cuckoo Search Algorithm.基于带有分子指纹和布谷鸟搜索算法的轻梯度提升机预测化合物的水溶性
ACS Omega. 2022 Nov 8;7(46):42027-42035. doi: 10.1021/acsomega.2c03885. eCollection 2022 Nov 22.
4
Predicting drug solubility in organic solvents mixtures: A machine-learning approach supported by high-throughput experimentation.预测有机溶剂混合物中的药物溶解度:一种基于高通量实验的机器学习方法。
Int J Pharm. 2024 Jul 20;660:124233. doi: 10.1016/j.ijpharm.2024.124233. Epub 2024 May 18.
5
Novel Solubility Prediction Models: Molecular Fingerprints and Physicochemical Features vs Graph Convolutional Neural Networks.新型溶解度预测模型:分子指纹和物理化学特征与图卷积神经网络
ACS Omega. 2022 Apr 4;7(14):12268-12277. doi: 10.1021/acsomega.2c00697. eCollection 2022 Apr 12.
6
Solubility of dapsone in deep eutectic solvents: Experimental analysis, molecular insights and machine learning predictions.地蒽酚在深共晶溶剂中的溶解度:实验分析、分子见解和机器学习预测。
Polim Med. 2024 Jan-Jun;54(1):15-25. doi: 10.17219/pim/177235.
7
Predicting Solubility Limits of Organic Solutes for a Wide Range of Solvents and Temperatures.预测宽范围溶剂和温度下有机溶质的溶解度极限。
J Am Chem Soc. 2022 Jun 22;144(24):10785-10797. doi: 10.1021/jacs.2c01768. Epub 2022 Jun 10.
8
Intelligence computational analysis of letrozole solubility in supercritical solvent via machine learning models.利用机器学习模型对来曲唑在超临界溶剂中的溶解度进行智能计算分析。
Sci Rep. 2024 Sep 17;14(1):21677. doi: 10.1038/s41598-024-73029-z.
9
Performance Analysis of Conventional Machine Learning Algorithms for Identification of Chronic Kidney Disease in Type 1 Diabetes Mellitus Patients.用于识别1型糖尿病患者慢性肾病的传统机器学习算法的性能分析
Diagnostics (Basel). 2021 Dec 3;11(12):2267. doi: 10.3390/diagnostics11122267.
10
Machine learning prediction of empirical polarity using SMILES encoding of organic solvents.基于有机溶剂 SMILES 编码的机器学习预测经验极性。
Mol Divers. 2023 Oct;27(5):2331-2343. doi: 10.1007/s11030-022-10559-6. Epub 2022 Nov 5.

引用本文的文献

1
A water solubility prediction algorithm based on the StackBoost model.一种基于StackBoost模型的水溶性预测算法。
PLoS One. 2025 Aug 29;20(8):e0330598. doi: 10.1371/journal.pone.0330598. eCollection 2025.
2
Physics-Based Solubility Prediction for Organic Molecules.基于物理的有机分子溶解度预测
Chem Rev. 2025 Aug 13;125(15):7057-7098. doi: 10.1021/acs.chemrev.4c00855. Epub 2025 Jul 29.
3
Machine learning analysis of molecular dynamics properties influencing drug solubility.影响药物溶解度的分子动力学性质的机器学习分析

本文引用的文献

1
Machine learning with physicochemical relationships: solubility prediction in organic solvents and water.基于物理化学关系的机器学习:有机溶剂和水中的溶解度预测。
Nat Commun. 2020 Nov 13;11(1):5753. doi: 10.1038/s41467-020-19594-z.
2
Multiple approaches for achieving drug solubility: an in silico perspective.实现药物溶解度的多种方法:计算视角。
Drug Discov Today. 2020 Jul;25(7):1206-1212. doi: 10.1016/j.drudis.2020.04.016. Epub 2020 Apr 27.
3
Can machine learning predict drug nanocrystals?机器学习能否预测药物纳米晶体?
Sci Rep. 2025 Jul 24;15(1):26955. doi: 10.1038/s41598-025-11392-1.
4
BigSolDB 2.0, dataset of solubility values for organic compounds in different solvents at various temperatures.BigSolDB 2.0,不同温度下有机化合物在不同溶剂中的溶解度值数据集。
Sci Data. 2025 Jul 15;12(1):1236. doi: 10.1038/s41597-025-05559-8.
5
Towards the prediction of drug solubility in binary solvent mixtures at various temperatures using machine learning.利用机器学习预测不同温度下药物在二元溶剂混合物中的溶解度
J Cheminform. 2024 Oct 28;16(1):117. doi: 10.1186/s13321-024-00911-3.
6
Leveraging Artificial Intelligence for Synergies in Drug Discovery: From Computers to Clinics.利用人工智能实现药物发现协同增效:从计算机到临床。
Curr Pharm Des. 2024;30(28):2187-2205. doi: 10.2174/0113816128308066240529121148.
7
Comparative Analysis of Chemical Descriptors by Machine Learning Reveals Atomistic Insights into Solute-Lipid Interactions.基于机器学习的化学描述符对比分析揭示了溶质-脂质相互作用的原子水平见解。
Mol Pharm. 2024 Jul 1;21(7):3343-3355. doi: 10.1021/acs.molpharmaceut.4c00080. Epub 2024 May 23.
8
Predicting small molecules solubility on endpoint devices using deep ensemble neural networks.使用深度集成神经网络预测小分子在终端设备上的溶解度。
Digit Discov. 2024 Mar 13;3(4):786-795. doi: 10.1039/d3dd00217a. eCollection 2024 Apr 17.
9
Machine Learning-Based Multiparametric Magnetic Resonance Imaging Radiomics Model for Preoperative Predicting the Deep Stromal Invasion in Patients with Early Cervical Cancer.基于机器学习的多参数磁共振成像放射组学模型在预测早期宫颈癌患者深层间质浸润中的应用。
J Imaging Inform Med. 2024 Feb;37(1):230-246. doi: 10.1007/s10278-023-00906-w. Epub 2024 Jan 10.
10
Bioactive Molecules from the Innate Immunity of Ascidians and Innovative Methods of Drug Discovery: A Computational Approach Based on Artificial Intelligence.来自海鞘先天免疫的生物活性分子与创新药物发现方法:基于人工智能的计算方法
Mar Drugs. 2023 Dec 20;22(1):6. doi: 10.3390/md22010006.
J Control Release. 2020 Jun 10;322:274-285. doi: 10.1016/j.jconrel.2020.03.043. Epub 2020 Mar 29.
4
Predicting oral disintegrating tablet formulations by neural network techniques.利用神经网络技术预测口腔崩解片制剂
Asian J Pharm Sci. 2018 Jul;13(4):336-342. doi: 10.1016/j.ajps.2018.01.003. Epub 2018 Feb 2.
5
Predicting complexation performance between cyclodextrins and guest molecules by integrated machine learning and molecular modeling techniques.通过集成机器学习和分子建模技术预测环糊精与客体分子之间的络合性能。
Acta Pharm Sin B. 2019 Nov;9(6):1241-1252. doi: 10.1016/j.apsb.2019.04.004. Epub 2019 May 8.
6
Supervised Learning and Mass Spectrometry Predicts the Fate of Nanomaterials.监督学习和质谱预测纳米材料的命运。
ACS Nano. 2019 Jul 23;13(7):8023-8034. doi: 10.1021/acsnano.9b02774. Epub 2019 Jul 3.
7
Deep learning for prediction of pharmaceutical formulations.用于预测药物制剂的深度学习
Acta Pharm Sin B. 2019 Jan;9(1):177-185. doi: 10.1016/j.apsb.2018.09.010. Epub 2018 Sep 28.
8
An Integrated Transfer Learning and Multitask Learning Approach for Pharmacokinetic Parameter Prediction.基于集成迁移学习和多任务学习的药代动力学参数预测方法。
Mol Pharm. 2019 Feb 4;16(2):533-541. doi: 10.1021/acs.molpharmaceut.8b00816. Epub 2019 Jan 4.
9
Computer-Aided Formulation Design for a Highly Soluble Lutein-Cyclodextrin Multiple-Component Delivery System.基于计算机辅助的叶黄素-环糊精多组分递释系统高溶解性配方设计。
Mol Pharm. 2018 Apr 2;15(4):1664-1673. doi: 10.1021/acs.molpharmaceut.8b00056. Epub 2018 Mar 15.
10
Computational prediction of drug solubility in water-based systems: Qualitative and quantitative approaches used in the current drug discovery and development setting.基于水相体系的药物溶解度的计算预测:在当前药物发现和开发环境中使用的定性和定量方法。
Int J Pharm. 2018 Apr 5;540(1-2):185-193. doi: 10.1016/j.ijpharm.2018.01.044. Epub 2018 Feb 6.