Luo Weiliang, Zhou Gengmo, Zhu Zhengdan, Yuan Yannan, Ke Guolin, Wei Zhewei, Gao Zhifeng, Zheng Hang
Department of Chemistry, Massachusetts Institute of Technology, Cambridge, Massachusetts 02139, United States.
DP Technology, Beijing 100089, China.
JACS Au. 2024 Jul 17;4(9):3451-3465. doi: 10.1021/jacsau.4c00271. eCollection 2024 Sep 23.
Integrating scientific principles into machine learning models to enhance their predictive performance and generalizability is a central challenge in the development of AI for Science. Herein, we introduce Uni-p , a novel framework that successfully incorporates thermodynamic principles into machine learning modeling, achieving high-precision predictions of acid dissociation constants (p ), a crucial task in the rational design of drugs and catalysts, as well as a modeling challenge in computational physical chemistry for small organic molecules. Uni-p utilizes a comprehensive free energy model to represent molecular protonation equilibria accurately. It features a structure enumerator that reconstructs molecular configurations from p data, coupled with a neural network that functions as a free energy predictor, ensuring high-throughput, data-driven prediction while preserving thermodynamic consistency. Employing a pretraining-finetuning strategy with both predicted and experimental p data, Uni-p not only achieves state-of-the-art accuracy in chemoinformatics but also shows comparable precision to quantum mechanics-based methods.
将科学原理融入机器学习模型以提高其预测性能和泛化能力是科学人工智能发展中的一项核心挑战。在此,我们介绍Uni-p,这是一个新颖的框架,它成功地将热力学原理纳入机器学习建模,实现了对酸解离常数(p)的高精度预测,这是药物和催化剂合理设计中的一项关键任务,也是小分子计算物理化学中的一个建模挑战。Uni-p利用一个综合自由能模型来准确表示分子质子化平衡。它具有一个结构枚举器,可根据p数据重建分子构型,以及一个作为自由能预测器的神经网络,在保持热力学一致性的同时确保高通量、数据驱动的预测。通过对预测的和实验的p数据采用预训练-微调策略,Uni-p不仅在化学信息学中达到了最先进的精度,而且还显示出与基于量子力学的方法相当的精度。