Suppr超能文献

利用机器学习提高小分子水合自由能预测的准确性和特征洞察

Enhancing Accuracy and Feature Insights in Hydration Free Energy Predictions for Small Molecules with Machine Learning.

作者信息

Han Mingjun, Zhang Yukai, Yu Taotao, Du Guodong, Yam ChiYung, Tang Ho-Kin

机构信息

School of Science, Harbin Institute of Technology, Shenzhen 518055, China.

Shenzhen Key Laboratory of Advanced Functional Carbon Materials Research and Comprehensive Application, Shenzhen 518055, China.

出版信息

ACS Omega. 2025 Jul 2;10(27):29781-29792. doi: 10.1021/acsomega.5c04249. eCollection 2025 Jul 15.

Abstract

Accurately predicting solvation free energy and understanding its physical determinants are essential for studying solute behavior in solution. This work employs advanced machine learning techniques to enhance predictive accuracy and extract insights into the solvation free energy of small molecules. Traditional machine learning approaches, compared to deep learning, are lightweight and require fewer computational resources. Our analysis identifies molecular geometry and topology as critical factors in predicting alchemical free energy, aligning with the theory that surface tension is a key determinant, while highlighting the role of charge distribution in improving force field designs for molecular dynamics. We propose an improved machine learning scheme that integrates K-nearest neighbors for feature processing, ensemble modeling, and dimensionality reduction. This scheme achieves a mean unsigned error of 0.53 kcal/mol on the FreeSolv data set using only two-dimensional features without pretraining on large databases, offering substantial accuracy improvements. This lightweight approach provides a viable alternative to computationally intensive deep learning models and holds promise for broad applications in chemical predictions.

摘要

准确预测溶剂化自由能并理解其物理决定因素对于研究溶质在溶液中的行为至关重要。这项工作采用先进的机器学习技术来提高预测准确性,并深入了解小分子的溶剂化自由能。与深度学习相比,传统机器学习方法轻量级且需要更少的计算资源。我们的分析确定分子几何形状和拓扑结构是预测炼金术自由能的关键因素,这与表面张力是关键决定因素的理论一致,同时突出了电荷分布在改进分子动力学力场设计中的作用。我们提出了一种改进的机器学习方案,该方案集成了K近邻算法进行特征处理、集成建模和降维。该方案在FreeSolv数据集上仅使用二维特征且无需在大型数据库上进行预训练的情况下,实现了平均无符号误差为0.53千卡/摩尔,大幅提高了准确性。这种轻量级方法为计算密集型深度学习模型提供了一种可行的替代方案,并有望在化学预测中得到广泛应用。

https://cdn.ncbi.nlm.nih.gov/pmc/blobs/6f16/12268727/8ef13dbe6f0b/ao5c04249_0001.jpg

文献AI研究员

20分钟写一篇综述,助力文献阅读效率提升50倍。

立即体验

用中文搜PubMed

大模型驱动的PubMed中文搜索引擎

马上搜索

文档翻译

学术文献翻译模型,支持多种主流文档格式。

立即体验