Shirts Michael R, Ferguson Andrew L
Department of Chemical and Biological Engineering, University of Colorado Boulder, Boulder, Colorado 80309, United States.
Pritzker School of Molecular Engineering, University of Chicago, Chicago, Illinois 60637, United States.
J Chem Theory Comput. 2020 Jul 14;16(7):4107-4125. doi: 10.1021/acs.jctc.0c00077. Epub 2020 Jul 2.
Free energies as a function of a selected set of collective variables are commonly computed in molecular simulation and of significant value in understanding and engineering molecular behavior. These free energy surfaces are most commonly estimated using variants of histogramming techniques, but such approaches obscure two important facets of these functions. First, the empirical observations along the collective variable are defined by an ensemble of discrete observations, and the coarsening of these observations into a histogram bin incurs unnecessary loss of information. Second, the free energy surface is itself almost always a continuous function, and its representation by a histogram introduces inherent approximations due to the discretization. In this study, we relate the observed discrete observations from biased simulations to the inferred underlying continuous probability distribution over the collective variables and derive histogram-free techniques for estimating this free energy surface. We reformulate free energy surface estimation as minimization of a Kullback-Leibler divergence between a continuous trial function and the discrete empirical distribution and show that this is equivalent to likelihood maximization of a trial function given a set of sampled data. We then present a fully Bayesian treatment of this formalism, which enables the incorporation of powerful Bayesian tools such as the inclusion of regularizing priors, uncertainty quantification, and model selection techniques. We demonstrate this new formalism in the analysis of umbrella sampling simulations for the χ torsion of a valine side chain in the L99A mutant of T4 lysozyme with benzene bound in the cavity.
在分子模拟中,作为一组选定集体变量函数的自由能通常会被计算,并且对于理解和设计分子行为具有重要价值。这些自由能面最常使用直方图技术的变体来估计,但此类方法掩盖了这些函数的两个重要方面。首先,沿着集体变量的经验观察是由一组离散观察定义的,将这些观察粗化到一个直方图箱中会导致不必要的信息损失。其次,自由能面本身几乎总是一个连续函数,用直方图表示由于离散化会引入固有近似。在本研究中,我们将有偏模拟中观察到的离散观察与集体变量上推断的潜在连续概率分布相关联,并推导出用于估计此自由能面的无直方图技术。我们将自由能面估计重新表述为连续试验函数与离散经验分布之间的库尔贝克 - 莱布勒散度的最小化,并表明这等同于给定一组采样数据时试验函数的似然最大化。然后,我们对这种形式主义进行了完全贝叶斯处理,这使得能够纳入强大的贝叶斯工具,如包含正则化先验、不确定性量化和模型选择技术。我们在分析T4溶菌酶L99A突变体中缬氨酸侧链的χ扭转且苯结合在腔内的伞形采样模拟中展示了这种新形式主义。