Max Planck Institute for Polymer Research, Ackermannweg 10, 55128 Mainz, Germany.
Department of Chemistry and Chemical Biology, Cornell University, Ithaca, New York 14853, USA.
J Chem Phys. 2018 Jun 28;148(24):241706. doi: 10.1063/1.5009502.
Classical intermolecular potentials typically require an extensive parametrization procedure for any new compound considered. To do away with prior parametrization, we propose a combination of physics-based potentials with machine learning (ML), coined IPML, which is transferable across small neutral organic and biologically relevant molecules. ML models provide on-the-fly predictions for environment-dependent local atomic properties: electrostatic multipole coefficients (significant error reduction compared to previously reported), the population and decay rate of valence atomic densities, and polarizabilities across conformations and chemical compositions of H, C, N, and O atoms. These parameters enable accurate calculations of intermolecular contributions-electrostatics, charge penetration, repulsion, induction/polarization, and many-body dispersion. Unlike other potentials, this model is transferable in its ability to handle new molecules and conformations without explicit prior parametrization: All local atomic properties are predicted from ML, leaving only eight global parameters-optimized once and for all across compounds. We validate IPML on various gas-phase dimers at and away from equilibrium separation, where we obtain mean absolute errors between 0.4 and 0.7 kcal/mol for several chemically and conformationally diverse datasets representative of non-covalent interactions in biologically relevant molecules. We further focus on hydrogen-bonded complexes-essential but challenging due to their directional nature-where datasets of DNA base pairs and amino acids yield an extremely encouraging 1.4 kcal/mol error. Finally, and as a first look, we consider IPML for denser systems: water clusters, supramolecular host-guest complexes, and the benzene crystal.
经典的分子间势通常需要对任何新考虑的化合物进行广泛的参数化程序。为了避免先前的参数化,我们提出了基于物理的势与机器学习(ML)的结合,称为 IPML,它可以在小的中性有机和生物相关分子之间转移。ML 模型提供了对环境相关局部原子性质的即时预测:静电多极系数(与之前报道的相比显著减少错误)、价原子密度的分布和衰减率,以及 H、C、N 和 O 原子的构象和化学成分的极化率。这些参数能够准确计算分子间贡献-静电、电荷穿透、排斥、诱导/极化和多体色散。与其他势不同,该模型具有可转移性,能够处理新分子和构象,而无需显式的先前参数化:所有局部原子性质都由 ML 预测,只留下八个全局参数-在化合物之间一次性优化。我们在各种气相二聚体上验证了 IPML,包括平衡分离和远离平衡分离的情况,在这些情况下,我们获得了几个化学和构象多样的数据集的平均绝对误差在 0.4 到 0.7 kcal/mol 之间,这些数据集代表了生物相关分子中非共价相互作用。我们进一步关注氢键复合物-由于其方向性,这是必不可少的,但具有挑战性-其中 DNA 碱基对和氨基酸的数据集产生了非常令人鼓舞的 1.4 kcal/mol 的误差。最后,作为初步尝试,我们考虑了 IPML 在更密集系统中的应用:水团簇、超分子主客体配合物和苯晶体。