Pan Xiaolin, Zhang Xudong, Xia Song, Zhang Yingkai
Department of Chemistry, New York University, New York, New York 10003, United States.
Simons Center for Computational Physical Chemistry at New York University, New York, New York 10003, United States.
J Chem Theory Comput. 2025 Mar 25;21(6):3132-3141. doi: 10.1021/acs.jctc.5c00041. Epub 2025 Mar 16.
Tautomerization plays a critical role in chemical and biological processes, influencing molecular stability, reactivity, biological activity, and ADME-Tox properties. Many drug-like molecules exist in multiple tautomeric states in aqueous solution, complicating the study of protein-ligand interactions. Rapid and accurate prediction of tautomer ratios and identification of predominant species are therefore crucial in computational drug discovery. In this study, we introduce sPhysNet-Taut, a deep learning model fine-tuned on experimental data using a Siamese neural network architecture. This model directly predicts tautomer ratios in aqueous solution based on MMFF94-optimized molecular geometries. On experimental test sets, sPhysNet-Taut achieves state-of-the-art performance with root-mean-square error (RMSE) of 1.9 kcal/mol on the 100-tautomers set and 1.0 kcal/mol on the SAMPL2 challenge, outperforming all other methods. It also provides superior ranking power for tautomer pairs on multiple test sets. Our results demonstrate that fine-tuning on experimental data significantly enhances model performance compared to training from scratch. This work not only offers a valuable deep learning model for predicting tautomer ratios but also presents a protocol for modeling pairwise data. To promote usability, we have developed an accessible tool that predicts stable tautomeric states in aqueous solution by enumerating all possible tautomeric states and ranking them using our model. The source code and web server are freely accessible at https://github.com/xiaolinpan/sPhysNet-Taut and https://yzhang.hpc.nyu.edu/tautomer.
互变异构在化学和生物过程中起着关键作用,影响分子稳定性、反应性、生物活性以及药物代谢动力学-毒理学性质。许多类药物分子在水溶液中以多种互变异构状态存在,这使得蛋白质-配体相互作用的研究变得复杂。因此,在计算药物发现中,快速准确地预测互变异构体比例并识别主要物种至关重要。在本研究中,我们引入了sPhysNet-Taut,这是一个使用暹罗神经网络架构在实验数据上进行微调的深度学习模型。该模型基于MMFF94优化的分子几何结构直接预测水溶液中的互变异构体比例。在实验测试集上,sPhysNet-Taut在100个互变异构体集上的均方根误差(RMSE)为1.9 kcal/mol,在SAMPL2挑战中为1.0 kcal/mol,达到了当前的最佳性能,优于所有其他方法。它在多个测试集上对互变异构体对也具有卓越的排序能力。我们的结果表明,与从头训练相比,在实验数据上进行微调显著提高了模型性能。这项工作不仅为预测互变异构体比例提供了一个有价值的深度学习模型,还提出了一种用于成对数据建模的方案。为了提高可用性,我们开发了一个可访问的工具,通过枚举所有可能的互变异构状态并使用我们的模型对其进行排序,来预测水溶液中的稳定互变异构状态。源代码和网络服务器可在https://github.com/xiaolinpan/sPhysNet-Taut和https://yzhang.hpc.nyu.edu/tautomer上免费获取。