Department of Chemical and Biological Engineering, University of New Mexico, Albuquerque, New Mexico.
Center for Computing Research, Sandia National Laboratories, Albuquerque, New Mexico.
Biophys J. 2022 Oct 18;121(20):3883-3895. doi: 10.1016/j.bpj.2022.08.045. Epub 2022 Sep 3.
One of the fundamental limitations of accurately modeling biomolecules like DNA is the inability to perform quantum chemistry calculations on large molecular structures. We present a machine learning model based on an equivariant Euclidean neural network framework to obtain accurate ab initio electron densities for arbitrary DNA structures that are much too large for conventional quantum methods. The model is trained on representative B-DNA basepair steps that capture both base pairing and base stacking interactions. The model produces accurate electron densities for arbitrary B-DNA structures with typical errors of less than 1%. Crucially, the error does not increase with system size, which suggests that the model can extrapolate to large DNA structures with negligible loss of accuracy. The model also generalizes reasonably to other DNA structural motifs such as the A- and Z-DNA forms, despite being trained on only B-DNA configurations. The model is used to calculate electron densities of several large-scale DNA structures, and we show that the computational scaling for this model is essentially linear. We also show that this machine learning electron density model can be used to calculate accurate electrostatic potentials for DNA. These electrostatic potentials produce more accurate results compared with classical force fields and do not show the usual deficiencies at short range.
准确模拟 DNA 等生物分子的一个基本限制是无法对大型分子结构进行量子化学计算。我们提出了一个基于等变欧式神经网络框架的机器学习模型,以获得对于传统量子方法而言过大的任意 DNA 结构的精确从头算电子密度。该模型在代表性的 B-DNA 碱基对步骤上进行训练,这些步骤捕捉了碱基对和碱基堆积相互作用。该模型为任意 B-DNA 结构生成精确的电子密度,典型误差小于 1%。至关重要的是,误差不会随系统大小而增加,这表明该模型可以外推到具有可忽略精度损失的大型 DNA 结构。该模型也相当合理地推广到其他 DNA 结构基序,例如 A 和 Z-DNA 形式,尽管仅在 B-DNA 配置上进行了训练。该模型用于计算几个大规模 DNA 结构的电子密度,我们表明该模型的计算缩放基本上是线性的。我们还表明,这种机器学习电子密度模型可用于计算 DNA 的精确静电势。与经典力场相比,这些静电势产生更准确的结果,并且不会在短程显示出常见的缺陷。