Department of Chemistry, Stanford University, Stanford, California 94305, USA.
J Chem Phys. 2010 Oct 14;133(14):144113. doi: 10.1063/1.3496438.
The dynamics of many biological processes of interest, such as the folding of a protein, are slow and complicated enough that a single molecular dynamics simulation trajectory of the entire process is difficult to obtain in any reasonable amount of time. Moreover, one such simulation may not be sufficient to develop an understanding of the mechanism of the process, and multiple simulations may be necessary. One approach to circumvent this computational barrier is the use of Markov state models. These models are useful because they can be constructed using data from a large number of shorter simulations instead of a single long simulation. This paper presents a new Bayesian method for the construction of Markov models from simulation data. A Markov model is specified by (τ,P,T), where τ is the mesoscopic time step, P is a partition of configuration space into mesostates, and T is an N(P)×N(P) transition rate matrix for transitions between the mesostates in one mesoscopic time step, where N(P) is the number of mesostates in P. The method presented here is different from previous Bayesian methods in several ways. (1) The method uses Bayesian analysis to determine the partition as well as the transition probabilities. (2) The method allows the construction of a Markov model for any chosen mesoscopic time-scale τ. (3) It constructs Markov models for which the diagonal elements of T are all equal to or greater than 0.5. Such a model will be called a "consistent mesoscopic Markov model" (CMMM). Such models have important advantages for providing an understanding of the dynamics on a mesoscopic time-scale. The Bayesian method uses simulation data to find a posterior probability distribution for (P,T) for any chosen τ. This distribution can be regarded as the Bayesian probability that the kinetics observed in the atomistic simulation data on the mesoscopic time-scale τ was generated by the CMMM specified by (P,T). An optimization algorithm is used to find the most probable CMMM for the chosen mesoscopic time step. We applied this method of Markov model construction to several toy systems (random walks in one and two dimensions) as well as the dynamics of alanine dipeptide in water. The resulting Markov state models were indeed successful in capturing the dynamics of our test systems on a variety of mesoscopic time-scales.
许多感兴趣的生物过程的动力学,如蛋白质的折叠,是缓慢而复杂的,以至于在任何合理的时间内都难以获得整个过程的单个分子动力学模拟轨迹。此外,单个模拟可能不足以理解该过程的机制,可能需要多个模拟。一种克服此计算障碍的方法是使用马尔可夫状态模型。这些模型很有用,因为它们可以使用大量较短模拟的数据来构建,而不是单个长模拟。本文提出了一种从模拟数据中构建马尔可夫模型的新贝叶斯方法。一个马尔可夫模型由(τ,P,T)指定,其中τ是介观时间步长,P是构型空间的划分成介态,T是在一个介观时间步中从一个介态到另一个介态的N(P)×N(P)转移率矩阵,其中 N(P)是 P 中的介态数。这里提出的方法在几个方面与以前的贝叶斯方法不同。(1) 该方法使用贝叶斯分析来确定分区以及转移概率。(2) 该方法允许为任何选定的介观时间尺度τ构建马尔可夫模型。(3) 它构建的马尔可夫模型的 T 的对角元素都等于或大于 0.5。这样的模型将被称为“一致介观马尔可夫模型”(CMMM)。对于理解介观时间尺度上的动力学,这样的模型具有重要的优势。贝叶斯方法使用模拟数据来为任何选定的τ找到(P,T)的后验概率分布。这个分布可以被视为在介观时间尺度τ上观察到的原子模拟数据的动力学是由(P,T)指定的 CMMM 生成的贝叶斯概率。使用优化算法找到所选介观时间步的最可能的 CMMM。我们将这种马尔可夫模型构建方法应用于几个玩具系统(一维和二维的随机游走)以及丙氨酸二肽在水中的动力学。所得到的马尔可夫状态模型确实成功地捕获了我们的测试系统在各种介观时间尺度上的动力学。