Thaler Stephan, Doehner Gregor, Zavadlav Julija
Professorship of Multiscale Modeling of Fluid Materials, Department of Engineering Physics and Computation, TUM School of Engineering and Design, Technical University of Munich, 85748 Garching near Munich, Germany.
Munich Data Science Institute, Technical University of Munich, 85748 Garching near Munich, Germany.
J Chem Theory Comput. 2023 Jul 25;19(14):4520-4532. doi: 10.1021/acs.jctc.2c01267. Epub 2023 Apr 4.
Neural network (NN) potentials promise highly accurate molecular dynamics (MD) simulations within the computational complexity of classical MD force fields. However, when applied outside their training domain, NN potential predictions can be inaccurate, increasing the need for Uncertainty Quantification (UQ). Bayesian modeling provides the mathematical framework for UQ, but classical Bayesian methods based on Markov chain Monte Carlo (MCMC) are computationally intractable for NN potentials. By training graph NN potentials for coarse-grained systems of liquid water and alanine dipeptide, we demonstrate here that scalable Bayesian UQ via stochastic gradient MCMC (SG-MCMC) yields reliable uncertainty estimates for MD observables. We show that cold posteriors can reduce the required training data size and that for reliable UQ, multiple Markov chains are needed. Additionally, we find that SG-MCMC and the Deep Ensemble method achieve comparable results, despite shorter training and less hyperparameter tuning of the latter. We show that both methods can capture aleatoric and epistemic uncertainty reliably, but not systematic uncertainty, which needs to be minimized by adequate modeling to obtain accurate credible intervals for MD observables. Our results represent a step toward accurate UQ that is of vital importance for trustworthy NN potential-based MD simulations required for decision-making in practice.
神经网络(NN)势函数有望在经典分子动力学(MD)力场的计算复杂度范围内实现高精度的分子动力学模拟。然而,当在其训练域之外应用时,NN势函数的预测可能不准确,这就增加了不确定性量化(UQ)的需求。贝叶斯建模为UQ提供了数学框架,但基于马尔可夫链蒙特卡罗(MCMC)的经典贝叶斯方法对于NN势函数来说在计算上是难以处理的。通过训练用于液态水和丙氨酸二肽粗粒度系统的图神经网络势函数,我们在此证明,通过随机梯度MCMC(SG-MCMC)进行可扩展的贝叶斯UQ能够为MD可观测量产生可靠的不确定性估计。我们表明,冷后验可以减少所需的训练数据量,并且为了获得可靠的UQ,需要多个马尔可夫链。此外,我们发现SG-MCMC和深度集成方法取得了可比的结果,尽管后者的训练时间更短且超参数调整更少。我们表明这两种方法都能可靠地捕捉偶然不确定性和认知不确定性,但不能捕捉系统不确定性,需要通过适当的建模将其最小化,以便为MD可观测量获得准确的可信区间。我们的结果代表了朝着准确的UQ迈出的一步,这对于实际决策中基于NN势函数的可靠MD模拟至关重要。