Huang Xuhui, Yao Yuan, Bowman Gregory R, Sun Jian, Guibas Leonidas J, Carlsson Gunnar, Pande Vijay S
Department of Chemistry, The Hong Kong University of Science & Technology, Kowloon, Hong Kong, China.
Pac Symp Biocomput. 2010:228-39. doi: 10.1142/9789814295291_0025.
Simulating biologically relevant timescales at atomic resolution is a challenging task since typical atomistic simulations are at least two orders of magnitude shorter. Markov State Models (MSMs) provide one means of overcoming this gap without sacrificing atomic resolution by extracting long time dynamics from short simulations. MSMs coarse grain space by dividing conformational space into long-lived, or metastable, states. This is equivalent to coarse graining time by integrating out fast motions within metastable states. By varying the degree of coarse graining one can vary the resolution of an MSM; therefore, MSMs are inherently multi-resolution. Here we introduce a new algorithm Super-level-set Hierarchical Clustering (SHC), to our knowledge, the first algorithm focused on constructing MSMs at multiple resolutions. The key insight of this algorithm is to generate a set of super levels covering different density regions of phase space, then cluster each super level separately, and finally recombine this information into a single MSM. SHC is able to produce MSMs at different resolutions using different super density level sets. To demonstrate the power of this algorithm we apply it to a small RNA hairpin, generating MSMs at four different resolutions. We validate these MSMs by showing that they are able to reproduce the original simulation data. Furthermore, long time folding dynamics are extracted from these models. The results show that there are no metastable on-pathway intermediate states. Instead, the folded state serves as a hub directly connected to multiple unfolded/misfolded states which are separated from each other by large free energy barriers.
在原子分辨率下模拟生物学相关的时间尺度是一项具有挑战性的任务,因为典型的原子模拟至少要短两个数量级。马尔可夫状态模型(MSM)提供了一种克服这一差距的方法,它通过从短模拟中提取长时间动态,而不牺牲原子分辨率。MSM通过将构象空间划分为长寿命或亚稳态来对空间进行粗粒化。这相当于通过对亚稳态内的快速运动进行积分来对时间进行粗粒化。通过改变粗粒化程度,可以改变MSM的分辨率;因此,MSM本质上是多分辨率的。在此,我们引入了一种新算法——超水平集层次聚类(SHC),据我们所知,这是第一种专注于构建多分辨率MSM的算法。该算法的关键见解是生成一组覆盖相空间不同密度区域的超水平,然后分别对每个超水平进行聚类,最后将这些信息重新组合成一个单一的MSM。SHC能够使用不同的超密度水平集生成不同分辨率的MSM。为了证明该算法的强大功能,我们将其应用于一个小RNA发夹,生成了四种不同分辨率的MSM。我们通过证明这些MSM能够重现原始模拟数据来验证它们。此外,从这些模型中提取了长时间的折叠动力学。结果表明,不存在亚稳态的折叠中间态。相反,折叠态充当了一个枢纽,直接连接到多个未折叠/错误折叠态,这些态之间由大的自由能垒分隔开。