Low Kaycee, Coote Michelle L, Izgorodina Ekaterina I
Monash Computational Chemistry Group, School of Chemistry, Monash University, Clayton, Victoria 3800, Australia.
Institute for Nanoscale Science and Technology, College of Science and Engineering, Flinders University, Bedford Park, South Australia 5042, Australia.
J Chem Theory Comput. 2023 Mar 14;19(5):1466-1475. doi: 10.1021/acs.jctc.2c00984. Epub 2023 Feb 14.
This work extends the electron deformation density-based descriptor, originally developed in the electron deformation density-based interaction energy machine learning (EDDIE-ML) algorithm to predict dimer interaction energies, to the prediction of three-body interactions in trimers. Using a sequential learning process to select the training data, the resulting Gaussian process regression (GPR) model predicts the three-body interaction energy within 0.2 kcal mol of the SRS-MP2/cc-pVTZ reference values for the 3B69 and S22-3 trimer data sets. A hybrid kernel function is introduced, which combines contributions from the average and individual atomic environments, allowing the total trimer interaction energy to be predicted in addition to the three-body contribution using the same descriptor. To extend the range and diversity of trimer interaction energies available in the literature, a new data set based on a protein-ligand crystal structure is introduced, consisting of 509 structures of a central ligand with two protein fragments. Benchmark calculations are provided for the new data set, which contains significantly larger molecular interactions than current databases in the literature in addition to charged fragments. Compared to density funtional theory (DFT)- and wavefunction-based methods for calculating the three-body interaction energy, our model makes predictions in a significantly shorter time frame by reducing the number of required SCF calculations from 7 to 4 performed at the PBE0 level of theory, showcasing the utility and efficiency of our Δ-ML method particularly when applied to larger systems.
这项工作将最初在基于电子变形密度的相互作用能机器学习(EDDIE-ML)算法中开发的用于预测二聚体相互作用能的基于电子变形密度的描述符扩展到三聚体中三体相互作用的预测。通过使用顺序学习过程来选择训练数据,所得的高斯过程回归(GPR)模型针对3B69和S22-3三聚体数据集,在SRS-MP2/cc-pVTZ参考值的0.2 kcal·mol范围内预测三体相互作用能。引入了一种混合核函数,该函数结合了平均和单个原子环境的贡献,除了使用相同描述符预测三体贡献外,还能预测三聚体的总相互作用能。为了扩展文献中可用的三聚体相互作用能的范围和多样性,引入了一个基于蛋白质-配体晶体结构的新数据集,该数据集由一个中心配体与两个蛋白质片段的509个结构组成。提供了针对新数据集的基准计算,该数据集除了带电片段外,还包含比文献中当前数据库大得多的分子相互作用。与用于计算三体相互作用能的密度泛函理论(DFT)和基于波函数的方法相比,我们的模型通过将理论水平为PBE0时所需的自洽场(SCF)计算次数从7次减少到4次,在显著更短的时间内进行预测,展示了我们的Δ-ML方法的实用性和效率,特别是在应用于更大系统时。