HKUST-Shenzhen Research Institute, Hi-Tech Park, Nanshan, Shenzhen 518057, China.
Department of Statistics, The Chinese University of Hong Kong, Shatin, New Territories, Hong Kong.
J Chem Phys. 2018 Aug 21;149(7):072337. doi: 10.1063/1.5027001.
Markov State Model (MSM) has become a popular approach to study the conformational dynamics of complex biological systems in recent years. Built upon a large number of short molecular dynamics simulation trajectories, MSM is able to predict the long time scale dynamics of complex systems. However, to achieve Markovianity, an MSM often contains hundreds or thousands of states (microstates), hindering human interpretation of the underlying system mechanism. One way to reduce the number of states is to lump kinetically similar states together and thus coarse-grain the microstates into macrostates. In this work, we introduce a probabilistic lumping algorithm, the Gibbs lumping algorithm, to assign a probability to any given kinetic lumping using the Bayesian inference. In our algorithm, the transitions among kinetically distinct macrostates are modeled by Poisson processes, which will well reflect the separation of time scales in the underlying free energy landscape of biomolecules. Furthermore, to facilitate the search for the optimal kinetic lumping (i.e., the lumped model with the highest probability), a Gibbs sampling algorithm is introduced. To demonstrate the power of our new method, we apply it to three systems: a 2D potential, alanine dipeptide, and a WW protein domain. In comparison with six other popular lumping algorithms, we show that our method can persistently produce the lumped macrostate model with the highest probability as well as the largest metastability. We anticipate that our Gibbs lumping algorithm holds great promise to be widely applied to investigate conformational changes in biological macromolecules.
马尔可夫状态模型(MSM)近年来已成为研究复杂生物系统构象动力学的一种流行方法。基于大量短分子动力学模拟轨迹,MSM 能够预测复杂系统的长时间尺度动力学。然而,为了实现马尔可夫性,MSM 通常包含数百或数千个状态(微观状态),这阻碍了人们对基础系统机制的理解。减少状态数量的一种方法是将动力学相似的状态组合在一起,从而将微观状态粗粒化为宏观状态。在这项工作中,我们引入了一种概率聚类算法,即吉布斯聚类算法,使用贝叶斯推断为任何给定的动力学聚类分配概率。在我们的算法中,动力学不同的宏观状态之间的转换由泊松过程建模,这将很好地反映生物分子基础自由能景观中时间尺度的分离。此外,为了方便寻找最佳动力学聚类(即具有最高概率的聚类模型),引入了吉布斯抽样算法。为了展示我们新方法的威力,我们将其应用于三个系统:二维势、丙氨酸二肽和 WW 蛋白结构域。与其他六种流行的聚类算法相比,我们表明我们的方法可以持续产生具有最高概率和最大亚稳性的聚类宏观状态模型。我们预计我们的吉布斯聚类算法将有很大的应用前景,可以广泛应用于研究生物大分子的构象变化。