Ulanov Evgeni, Qadir Ghulam A, Riedmiller Kai, Friederich Pascal, Gräter Frauke
Heidelberg Institute for Theoretical Studies Heidelberg Germany
Max Planck Institute for Polymer Research Mainz Germany.
Digit Discov. 2025 Jan 10;4(2):513-522. doi: 10.1039/d4dd00174e. eCollection 2025 Feb 12.
Predicting reaction barriers for arbitrary configurations based on only a limited set of density functional theory (DFT) calculations would render the design of catalysts or the simulation of reactions within complex materials highly efficient. We here propose Gaussian process regression (GPR) as a method of choice if DFT calculations are limited to hundreds or thousands of barrier calculations. For the case of hydrogen atom transfer in proteins, an important reaction in chemistry and biology, we obtain a mean absolute error of 3.23 kcal mol for the range of barriers in the data set using SOAP descriptors and similar values using the marginalized graph kernel. Thus, the two GPR models can robustly estimate reaction barriers within the large chemical and conformational space of proteins. Their predictive power is comparable to a graph neural network-based model, and GPR even outcompetes the latter in the low data regime. We propose GPR as a valuable tool for an approximate but data-efficient model of chemical reactivity in a complex and highly variable environment.
仅基于有限的一组密度泛函理论(DFT)计算来预测任意构型的反应势垒,将使催化剂设计或复杂材料内反应的模拟变得高效。如果DFT计算仅限于数百或数千次势垒计算,我们在此提出高斯过程回归(GPR)作为一种选择方法。对于蛋白质中氢原子转移这一化学和生物学中的重要反应,使用SOAP描述符时,我们在数据集中势垒范围内得到的平均绝对误差为3.23千卡/摩尔,使用边缘化图核时得到类似值。因此,这两个GPR模型能够在蛋白质的大化学和构象空间内稳健地估计反应势垒。它们的预测能力与基于图神经网络的模型相当,并且在低数据量情况下GPR甚至优于后者。我们提出GPR作为一种有价值的工具,用于在复杂且高度可变的环境中构建近似但数据高效的化学反应性模型。