Han Mingjun, Zhang Yukai, Yu Taotao, Du Guodong, Yam ChiYung, Tang Ho-Kin
School of Science, Harbin Institute of Technology, Shenzhen 518055, China.
Shenzhen Key Laboratory of Advanced Functional Carbon Materials Research and Comprehensive Application, Shenzhen 518055, China.
ACS Omega. 2025 Jul 2;10(27):29781-29792. doi: 10.1021/acsomega.5c04249. eCollection 2025 Jul 15.
Accurately predicting solvation free energy and understanding its physical determinants are essential for studying solute behavior in solution. This work employs advanced machine learning techniques to enhance predictive accuracy and extract insights into the solvation free energy of small molecules. Traditional machine learning approaches, compared to deep learning, are lightweight and require fewer computational resources. Our analysis identifies molecular geometry and topology as critical factors in predicting alchemical free energy, aligning with the theory that surface tension is a key determinant, while highlighting the role of charge distribution in improving force field designs for molecular dynamics. We propose an improved machine learning scheme that integrates K-nearest neighbors for feature processing, ensemble modeling, and dimensionality reduction. This scheme achieves a mean unsigned error of 0.53 kcal/mol on the FreeSolv data set using only two-dimensional features without pretraining on large databases, offering substantial accuracy improvements. This lightweight approach provides a viable alternative to computationally intensive deep learning models and holds promise for broad applications in chemical predictions.
准确预测溶剂化自由能并理解其物理决定因素对于研究溶质在溶液中的行为至关重要。这项工作采用先进的机器学习技术来提高预测准确性,并深入了解小分子的溶剂化自由能。与深度学习相比,传统机器学习方法轻量级且需要更少的计算资源。我们的分析确定分子几何形状和拓扑结构是预测炼金术自由能的关键因素,这与表面张力是关键决定因素的理论一致,同时突出了电荷分布在改进分子动力学力场设计中的作用。我们提出了一种改进的机器学习方案,该方案集成了K近邻算法进行特征处理、集成建模和降维。该方案在FreeSolv数据集上仅使用二维特征且无需在大型数据库上进行预训练的情况下,实现了平均无符号误差为0.53千卡/摩尔,大幅提高了准确性。这种轻量级方法为计算密集型深度学习模型提供了一种可行的替代方案,并有望在化学预测中得到广泛应用。