Centre for Computational Science, Department of Chemistry, University College London, London WC1H 0AJ, United Kingdom.
Centrum Wiskunde & Informatica, Scientific Computing Group, Amsterdam 1090 GB, The Netherlands.
J Chem Theory Comput. 2021 Aug 10;17(8):5187-5197. doi: 10.1021/acs.jctc.1c00526. Epub 2021 Jul 19.
Classical molecular dynamics is a computer simulation technique that is in widespread use across many areas of science, from physics and chemistry to materials, biology, and medicine. The method continues to attract criticism due its oft-reported lack of reproducibility which is in part due to a failure to submit it to reliable uncertainty quantification (UQ). Here we show that the uncertainty arises from a combination of (i) the input parameters and (ii) the intrinsic stochasticity of the method controlled by the random seeds. To illustrate the situation, we make a systematic UQ analysis of a widely used molecular dynamics code (NAMD), applied to estimate binding free energy of a ligand-bound to a protein. In particular, we replace the usually fixed input parameters with random variables, systematically distributed about their mean values, and study the resulting distribution of the simulation output. We also perform a sensitivity analysis, which reveals that, out of a total of 175 parameters, just six dominate the variance in the code output. Furthermore, we show that binding energy calculations dampen the input uncertainty, in the sense that the variation around the mean output free energy is less than the variation around the mean of the assumed input distributions, if the output is ensemble-averaged over the random seeds. Without such ensemble averaging, the predicted free energy is five times more uncertain. The distribution of the predicted properties is thus strongly dependent upon the random seed. Owing to this substantial uncertainty, robust statistical measures of uncertainty in molecular dynamics simulation require the use of ensembles in all contexts.
经典分子动力学是一种计算机模拟技术,广泛应用于物理、化学、材料、生物和医学等多个科学领域。由于其经常被报道的缺乏可重复性,该方法仍然受到批评,部分原因是未能对其进行可靠的不确定性量化 (UQ)。在这里,我们表明,不确定性源于(i)输入参数和(ii)由随机种子控制的方法的固有随机性的组合。为了说明这种情况,我们对广泛使用的分子动力学代码 (NAMD) 进行了系统的 UQ 分析,该代码用于估计配体与蛋白质结合的结合自由能。特别是,我们用随机变量代替通常固定的输入参数,这些随机变量系统地分布在其平均值周围,并研究模拟输出的分布。我们还进行了敏感性分析,结果表明,在总共 175 个参数中,只有六个参数主导着代码输出的方差。此外,我们还表明,结合能计算抑制了输入不确定性,从这个意义上讲,如果输出是通过随机种子进行平均的,则平均值周围的输出自由能的变化小于平均值周围的假设输入分布的变化。如果没有这种集合平均,预测的自由能的不确定性将增加五倍。因此,预测属性的分布强烈依赖于随机种子。由于这种巨大的不确定性,在所有情况下,分子动力学模拟的稳健统计不确定性度量都需要在所有情况下使用集合。