Airas Justin, Ding Xinqiang, Zhang Bin
Department of Chemistry, Massachusetts Institute of Technology, Cambridge, MA, USA.
bioRxiv. 2023 Sep 12:2023.09.08.556923. doi: 10.1101/2023.09.08.556923.
Coarse-grained (CG) force fields are essential for molecular dynamics simulations of biomolecules, striking a balance between computational efficiency and biological realism. These simulations employ simplified models grouping atoms into interaction sites, enabling the study of complex biomolecular systems over biologically relevant timescales. Efforts are underway to develop accurate and transferable CG force fields, guided by a bottom-up approach that matches the CG energy function with the potential of mean force (PMF) defined by the finer system. However, practical challenges arise due to many-body effects, lack of analytical expressions for the PMF, and limitations in parameterizing CG force fields. To address these challenges, a machine learning-based approach is proposed, utilizing graph neural networks (GNNs) to represent CG force fields and potential contrasting for parameterization from atomistic simulation data. We demonstrate the effectiveness of the approach by deriving a transferable GNN implicit solvent model using 600,000 atomistic configurations of six proteins obtained from explicit solvent simulations. The GNN model provides solvation free energy estimations much more accurately than state-of-the-art implicit solvent models, reproducing configurational distributions of explicit solvent simulations. We also demonstrate the reasonable transferability of the GNN model outside the training data. Our study offers valuable insights for building accurate coarse-grained models bottom-up.
粗粒度(CG)力场对于生物分子的分子动力学模拟至关重要,它在计算效率和生物学真实性之间取得了平衡。这些模拟采用简化模型,将原子分组为相互作用位点,从而能够在生物学相关的时间尺度上研究复杂的生物分子系统。目前正在努力开发准确且可转移的CG力场,采用自下而上的方法,使CG能量函数与更精细系统定义的平均力势(PMF)相匹配。然而,由于多体效应、缺乏PMF的解析表达式以及CG力场参数化的局限性,实际挑战依然存在。为了应对这些挑战,提出了一种基于机器学习的方法,利用图神经网络(GNN)来表示CG力场,并从原子模拟数据中进行参数化的潜在对比。我们通过使用从显式溶剂模拟中获得的六种蛋白质的600,000个原子构型推导可转移的GNN隐式溶剂模型,证明了该方法的有效性。GNN模型提供的溶剂化自由能估计比现有最先进的隐式溶剂模型准确得多,能够重现显式溶剂模拟的构型分布。我们还展示了GNN模型在训练数据之外的合理可转移性。我们的研究为自下而上构建准确的粗粒度模型提供了有价值的见解。