Department of Chemistry, Duke University, Durham, North Carolina 27708, USA.
J Chem Phys. 2023 Jul 14;159(2). doi: 10.1063/5.0142280.
Molecular dynamics (MD) is an extremely powerful, highly effective, and widely used approach to understanding the nature of chemical processes in atomic details for proteins. The accuracy of results from MD simulations is highly dependent on force fields. Currently, molecular mechanical (MM) force fields are mainly utilized in MD simulations because of their low computational cost. Quantum mechanical (QM) calculation has high accuracy, but it is exceedingly time consuming for protein simulations. Machine learning (ML) provides the capability for generating accurate potential at the QM level without increasing much computational effort for specific systems that can be studied at the QM level. However, the construction of general machine learned force fields, needed for broad applications and large and complex systems, is still challenging. Here, general and transferable neural network (NN) force fields based on CHARMM force fields, named CHARMM-NN, are constructed for proteins by training NN models on 27 fragments partitioned from the residue-based systematic molecular fragmentation (rSMF) method. The NN for each fragment is based on atom types and uses new input features that are similar to MM inputs, including bonds, angles, dihedrals, and non-bonded terms, which enhance the compatibility of CHARMM-NN to MM MD and enable the implementation of CHARMM-NN force fields in different MD programs. While the main part of the energy of the protein is based on rSMF and NN, the nonbonded interactions between the fragments and with water are taken from the CHARMM force field through mechanical embedding. The validations of the method for dipeptides on geometric data, relative potential energies, and structural reorganization energies demonstrate that the CHARMM-NN local minima on the potential energy surface are very accurate approximations to QM, showing the success of CHARMM-NN for bonded interactions. However, the MD simulations on peptides and proteins indicate that more accurate methods to represent protein-water interactions in fragments and non-bonded interactions between fragments should be considered in the future improvement of CHARMM-NN, which can increase the accuracy of approximation beyond the current mechanical embedding QM/MM level.
分子动力学(MD)是一种极其强大、高效且广泛应用的方法,可以深入了解蛋白质中化学过程的原子本质。MD 模拟结果的准确性高度依赖于力场。目前,由于计算成本低,分子力学(MM)力场主要用于 MD 模拟。量子力学(QM)计算具有高精度,但对于蛋白质模拟来说,计算量极大。机器学习(ML)为在 QM 水平上生成准确的势能提供了能力,而不会对可以在 QM 水平上研究的特定系统增加太多计算工作量。然而,构建用于广泛应用和大型复杂系统的通用机器学习力场仍然具有挑战性。在这里,通过在基于残基的系统分子碎片化(rSMF)方法分割的 27 个片段上训练 NN 模型,为蛋白质构建了基于 CHARMM 力场的通用和可转移神经网络(NN)力场,命名为 CHARMM-NN。每个片段的 NN 基于原子类型,并使用类似于 MM 输入的新输入特征,包括键、角、二面角和非键项,这增强了 CHARMM-NN 与 MM MD 的兼容性,并使 CHARMM-NN 力场能够在不同的 MD 程序中实现。虽然蛋白质的主要能量部分基于 rSMF 和 NN,但片段之间以及与水之间的非键相互作用则通过机械嵌入从 CHARMM 力场中获取。该方法在二肽的几何数据、相对势能和结构重排能上的验证表明,CHARMM-NN 在势能面上的局部最小值非常接近 QM,表明 CHARMM-NN 对键相互作用的成功。然而,对肽和蛋白质的 MD 模拟表明,在未来的 CHARMM-NN 改进中,应该考虑更准确的方法来表示片段中的蛋白质-水相互作用和片段之间的非键相互作用,这可以提高逼近精度,超越当前的机械嵌入 QM/MM 水平。