利用机器学习提高小分子水合自由能预测的准确性和特征洞察

Enhancing Accuracy and Feature Insights in Hydration Free Energy Predictions for Small Molecules with Machine Learning.

作者信息

Han Mingjun, Zhang Yukai, Yu Taotao, Du Guodong, Yam ChiYung, Tang Ho-Kin

机构信息

School of Science, Harbin Institute of Technology, Shenzhen 518055, China.

Shenzhen Key Laboratory of Advanced Functional Carbon Materials Research and Comprehensive Application, Shenzhen 518055, China.

出版信息

ACS Omega. 2025 Jul 2;10(27):29781-29792. doi: 10.1021/acsomega.5c04249. eCollection 2025 Jul 15.

DOI:10.1021/acsomega.5c04249

PMID:40687018

原文链接:https://pmc.ncbi.nlm.nih.gov/articles/PMC12268727/

Abstract

Accurately predicting solvation free energy and understanding its physical determinants are essential for studying solute behavior in solution. This work employs advanced machine learning techniques to enhance predictive accuracy and extract insights into the solvation free energy of small molecules. Traditional machine learning approaches, compared to deep learning, are lightweight and require fewer computational resources. Our analysis identifies molecular geometry and topology as critical factors in predicting alchemical free energy, aligning with the theory that surface tension is a key determinant, while highlighting the role of charge distribution in improving force field designs for molecular dynamics. We propose an improved machine learning scheme that integrates K-nearest neighbors for feature processing, ensemble modeling, and dimensionality reduction. This scheme achieves a mean unsigned error of 0.53 kcal/mol on the FreeSolv data set using only two-dimensional features without pretraining on large databases, offering substantial accuracy improvements. This lightweight approach provides a viable alternative to computationally intensive deep learning models and holds promise for broad applications in chemical predictions.

摘要

准确预测溶剂化自由能并理解其物理决定因素对于研究溶质在溶液中的行为至关重要。这项工作采用先进的机器学习技术来提高预测准确性，并深入了解小分子的溶剂化自由能。与深度学习相比，传统机器学习方法轻量级且需要更少的计算资源。我们的分析确定分子几何形状和拓扑结构是预测炼金术自由能的关键因素，这与表面张力是关键决定因素的理论一致，同时突出了电荷分布在改进分子动力学力场设计中的作用。我们提出了一种改进的机器学习方案，该方案集成了K近邻算法进行特征处理、集成建模和降维。该方案在FreeSolv数据集上仅使用二维特征且无需在大型数据库上进行预训练的情况下，实现了平均无符号误差为0.53千卡/摩尔，大幅提高了准确性。这种轻量级方法为计算密集型深度学习模型提供了一种可行的替代方案，并有望在化学预测中得到广泛应用。

https://cdn.ncbi.nlm.nih.gov/pmc/blobs/6f16/12268727/8ef13dbe6f0b/ao5c04249_0001.jpg

相似文献

Enhancing Accuracy and Feature Insights in Hydration Free Energy Predictions for Small Molecules with Machine Learning.

ACS Omega. 2025 Jul 2;10(27):29781-29792. doi: 10.1021/acsomega.5c04249. eCollection 2025 Jul 15.

Stabilizing machine learning for reproducible and explainable results: A novel validation approach to subject-specific insights.

Comput Methods Programs Biomed. 2025 Jun 21;269:108899. doi: 10.1016/j.cmpb.2025.108899.

Leveraging a foundation model zoo for cell similarity search in oncological microscopy across devices.

Front Oncol. 2025 Jun 18;15:1480384. doi: 10.3389/fonc.2025.1480384. eCollection 2025.

Bangla Speech Emotion Recognition Using Deep Learning-Based Ensemble Learning and Feature Fusion.

J Imaging. 2025 Aug 14;11(8):273. doi: 10.3390/jimaging11080273.

A Responsible Framework for Assessing, Selecting, and Explaining Machine Learning Models in Cardiovascular Disease Outcomes Among People With Type 2 Diabetes: Methodology and Validation Study.

JMIR Med Inform. 2025 Jun 27;13:e66200. doi: 10.2196/66200.

A quantum machine learning framework for predicting drug sensitivity in multiple myeloma using proteomic data.

Sci Rep. 2025 Jul 22;15(1):26553. doi: 10.1038/s41598-025-06544-2.

A deep learning approach to direct immunofluorescence pattern recognition in autoimmune bullous diseases.

Br J Dermatol. 2024 Jul 16;191(2):261-266. doi: 10.1093/bjd/ljae142.

Classification of finger movements through optimal EEG channel and feature selection.

Front Hum Neurosci. 2025 Jul 16;19:1633910. doi: 10.3389/fnhum.2025.1633910. eCollection 2025.

Predictive modeling of complications arising from early-onset preeclampsia in pregnant women.

Womens Health (Lond). 2025 Jan-Dec;21:17455057251348978. doi: 10.1177/17455057251348978. Epub 2025 Jul 21.

Supervised Machine Learning Models for Predicting Sepsis-Associated Liver Injury in Patients With Sepsis: Development and Validation Study Based on a Multicenter Cohort Study.

J Med Internet Res. 2025 May 26;27:e66733. doi: 10.2196/66733.

本文引用的文献

Physics-Based Machine Learning to Predict Hydration Free Energies for Small Molecules with a Minimal Number of Descriptors: Interpretable and Accurate.

J Phys Chem B. 2025 Feb 6;129(5):1640-1647. doi: 10.1021/acs.jpcb.4c07090. Epub 2025 Jan 22.

Explainable Supervised Machine Learning Model To Predict Solvation Gibbs Energy.

J Chem Inf Model. 2024 Apr 8;64(7):2250-2262. doi: 10.1021/acs.jcim.3c00544. Epub 2023 Aug 21.

Persistent Dirac for molecular representation.

Sci Rep. 2023 Jul 11;13(1):11183. doi: 10.1038/s41598-023-37853-z.

Machine Learning Prediction of Hydration Free Energy with Physically Inspired Descriptors.

J Phys Chem Lett. 2023 Feb 23;14(7):1877-1884. doi: 10.1021/acs.jpclett.2c03858. Epub 2023 Feb 13.

Explainable Solvation Free Energy Prediction Combining Graph Neural Networks with Chemical Intuition.

J Chem Inf Model. 2022 Nov 28;62(22):5457-5470. doi: 10.1021/acs.jcim.2c01013. Epub 2022 Nov 1.

Halonium, chalconium, and pnictonium salts as noncovalent organocatalysts: a computational study on relative catalytic activity.

Org Biomol Chem. 2022 Oct 5;20(38):7632-7639. doi: 10.1039/d2ob01415g.

3D-RISM-AI: A Machine Learning Approach to Predict Protein-Ligand Binding Affinity Using 3D-RISM.

J Phys Chem B. 2022 Aug 25;126(33):6148-6158. doi: 10.1021/acs.jpcb.2c03384. Epub 2022 Aug 15.

Accurate Prediction of Aqueous Free Solvation Energies Using 3D Atomic Feature-Based Graph Neural Network with Transfer Learning.

J Chem Inf Model. 2022 Apr 25;62(8):1840-1848. doi: 10.1021/acs.jcim.2c00260. Epub 2022 Apr 14.

Improved prediction of solvation free energies by machine-learning polarizable continuum solvation model.

Nat Commun. 2021 Jun 18;12(1):3584. doi: 10.1038/s41467-021-23724-6.

Algebraic graph-assisted bidirectional transformers for molecular property prediction.

Nat Commun. 2021 Jun 10;12(1):3521. doi: 10.1038/s41467-021-23720-w.

文献AI研究员

20分钟写一篇综述，助力文献阅读效率提升50倍。

立即体验

用中文搜PubMed

大模型驱动的PubMed中文搜索引擎

马上搜索

文档翻译

学术文献翻译模型，支持多种主流文档格式。

立即体验

利用机器学习提高小分子水合自由能预测的准确性和特征洞察

Enhancing Accuracy and Feature Insights in Hydration Free Energy Predictions for Small Molecules with Machine Learning.

作者信息

机构信息

出版信息

相似文献

本文引用的文献

文献AI研究员

用中文搜PubMed

文档翻译

Suppr 超能文献

相似文献

本文引用的文献