Yuan Yongna, Mills Matthew J L, Popelier Paul L A
Manchester Institute of Biotechnology (MIB), 131 Princess Street, Manchester, M1 7DN, Great Britain.
J Mol Model. 2014 Apr;20(4):2172. doi: 10.1007/s00894-014-2172-1. Epub 2014 Mar 16.
A multipolar, polarizable electrostatic method for future use in a novel force field is described. Quantum Chemical Topology (QCT) is used to partition the electron density of a chemical system into atoms, then the machine learning method Kriging is used to build models that relate the multipole moments of the atoms to the positions of their surrounding nuclei. The pilot system serine is used to study both the influence of the level of theory and the set of data generator methods used. The latter consists of: (i) sampling of protein structures deposited in the Protein Data Bank (PDB), or (ii) normal mode distortion along either (a) Cartesian coordinates, or (b) redundant internal coordinates. Wavefunctions for the sampled geometries were obtained at the HF/6-31G(d,p), B3LYP/apc-1, and MP2/cc-pVDZ levels of theory, prior to calculation of the atomic multipole moments by volume integration. The average absolute error (over an independent test set of conformations) in the total atom-atom electrostatic interaction energy of serine, using Kriging models built with the three data generator methods is 11.3 kJ mol⁻¹ (PDB), 8.2 kJ mol⁻¹ (Cartesian distortion), and 10.1 kJ mol⁻¹ (redundant internal distortion) at the HF/6-31G(d,p) level. At the B3LYP/apc-1 level, the respective errors are 7.7 kJ mol⁻¹, 6.7 kJ mol⁻¹, and 4.9 kJ mol⁻¹, while at the MP2/cc-pVDZ level they are 6.5 kJ mol⁻¹, 5.3 kJ mol⁻¹, and 4.0 kJ mol⁻¹. The ranges of geometries generated by the redundant internal coordinate distortion and by extraction from the PDB are much wider than the range generated by Cartesian distortion. The atomic multipole moment and electrostatic interaction energy predictions for the B3LYP/apc-1 and MP2/cc-pVDZ levels are similar, and both are better than the corresponding predictions at the HF/6-31G(d,p) level.
描述了一种用于未来新型力场的多极、可极化静电方法。量子化学拓扑学(QCT)用于将化学系统的电子密度划分为原子,然后使用机器学习方法克里金法构建将原子的多极矩与其周围原子核位置相关联的模型。以丝氨酸作为试验系统,研究理论水平和所使用的数据生成方法集的影响。后者包括:(i)对蛋白质数据库(PDB)中存储的蛋白质结构进行采样,或(ii)沿(a)笛卡尔坐标或(b)冗余内部坐标进行简正模式畸变。在通过体积积分计算原子多极矩之前,在HF/6 - 31G(d,p)、B3LYP/apc - 1和MP2/cc - pVDZ理论水平下获得采样几何结构的波函数。使用通过三种数据生成方法构建的克里金模型,丝氨酸总原子 - 原子静电相互作用能(在独立测试构象集上)的平均绝对误差在HF/6 - 31G(d,p)水平下分别为11.3 kJ mol⁻¹(PDB)、8.2 kJ mol⁻¹(笛卡尔畸变)和10.1 kJ mol⁻¹(冗余内部畸变)。在B3LYP/apc - 1水平下,相应误差分别为7.7 kJ mol⁻¹、6.7 kJ mol⁻¹和4.9 kJ mol⁻¹,而在MP /cc - pVDZ水平下分别为6.5 kJ mol⁻¹、5.3 kJ mol⁻¹和4.0 kJ mol⁻¹。由冗余内部坐标畸变和从PDB中提取产生 geometries的范围比笛卡尔畸变产生的范围宽得多。B3LYP/apc - 1和MP2/cc - pVDZ水平下的原子多极矩和静电相互作用能预测相似,且两者均优于HF/6 - 31G(d,p)水平下的相应预测。