Yadav Ajeet Kumar, Prakash Marvin V, Bandyopadhyay Pradipta
School of Computational and Integrative Sciences, Jawaharlal Nehru University, New Delhi 110067, India.
J Phys Chem B. 2025 Feb 6;129(5):1640-1647. doi: 10.1021/acs.jpcb.4c07090. Epub 2025 Jan 22.
Hydration free energy (HFE) of molecules is a fundamental property having importance throughout chemistry and biology. Calculation of the HFE can be challenging and expensive with classical molecular dynamics simulation-based approaches. Machine learning (ML) models are increasingly being used to predict HFE. Although the accuracy of ML models for data sets for small molecules is impressive, these models suffer from lack of interpretability. In this work, we have developed a physics-based ML model with only six descriptors, which is both accurate and fully interpretable, and applied it to a database for small molecule HFE, . We evaluated the electrostatic energy by an approximate closed form of the Generalized Born (GB) model and polar surface area. In addition, we have log and hydrogen bond acceptor and donors as descriptors along with the number of rotatable bonds. We have used different ML models, such as random forest and extreme gradient boosting. The best result from these models has a mean absolute error of only 0.74 kcal/mol. The main power of this model is that the descriptors have clear physical meaning, and it was found that the descriptor describing the electrostatics and the polar surface area, followed by the hydrogen bond donors and acceptors, are the most important factors for the calculation of hydration free energy.
分子的水化自由能(HFE)是一种在化学和生物学领域都具有重要意义的基本性质。使用基于经典分子动力学模拟的方法来计算HFE可能具有挑战性且成本高昂。机器学习(ML)模型越来越多地被用于预测HFE。尽管针对小分子数据集的ML模型的准确性令人印象深刻,但这些模型缺乏可解释性。在这项工作中,我们开发了一种仅包含六个描述符的基于物理的ML模型,该模型既准确又完全可解释,并将其应用于小分子HFE的数据库。我们通过广义玻恩(GB)模型的近似封闭形式和极性表面积来评估静电能。此外,我们将对数、氢键受体和供体以及可旋转键的数量作为描述符。我们使用了不同的ML模型,如随机森林和极端梯度提升。这些模型的最佳结果的平均绝对误差仅为0.74千卡/摩尔。该模型的主要优势在于描述符具有明确的物理意义,并且发现描述静电和极性表面积的描述符,其次是氢键供体和受体,是计算水化自由能的最重要因素。