Husic Brooke E, McKiernan Keri A, Wayment-Steele Hannah K, Sultan Mohammad M, Pande Vijay S
Department of Chemistry, Stanford University , Stanford, California 94305, United States.
J Chem Theory Comput. 2018 Feb 13;14(2):1071-1082. doi: 10.1021/acs.jctc.7b01004. Epub 2018 Jan 24.
Markov state models (MSMs) are a powerful framework for the analysis of molecular dynamics data sets, such as protein folding simulations, because of their straightforward construction and statistical rigor. The coarse-graining of MSMs into an interpretable number of macrostates is a crucial step for connecting theoretical results with experimental observables. Here we present the minimum variance clustering approach (MVCA) for the coarse-graining of MSMs into macrostate models. The method utilizes agglomerative clustering with Ward's minimum variance objective function, and the similarity of the microstate dynamics is determined using the Jensen-Shannon divergence between the corresponding rows in the MSM transition probability matrix. We first show that MVCA produces intuitive results for a simple tripeptide system and is robust toward long-duration statistical artifacts. MVCA is then applied to two protein folding simulations of the same protein in different force fields to demonstrate that a different number of macrostates is appropriate for each model, revealing a misfolded state present in only one of the simulations. Finally, we show that the same method can be used to analyze a data set containing many MSMs from simulations in different force fields by aggregating them into groups and quantifying their dynamical similarity in the context of force field parameter choices. The minimum variance clustering approach with the Jensen-Shannon divergence provides a powerful tool to group dynamics by similarity, both among model states and among dynamical models themselves.
马尔可夫状态模型(MSMs)是分析分子动力学数据集(如蛋白质折叠模拟)的强大框架,因其构建简单且统计严谨。将MSMs粗粒化为可解释数量的宏观状态是将理论结果与实验观测联系起来的关键步骤。在此,我们提出了用于将MSMs粗粒化为宏观状态模型的最小方差聚类方法(MVCA)。该方法利用具有沃德最小方差目标函数的凝聚聚类,并且使用MSM转移概率矩阵中对应行之间的 Jensen-Shannon 散度来确定微状态动力学的相似性。我们首先表明,MVCA 对于一个简单的三肽系统产生直观的结果,并且对长时间的统计伪像具有鲁棒性。然后将 MVCA 应用于同一蛋白质在不同力场中的两个蛋白质折叠模拟,以证明每个模型适合不同数量的宏观状态,揭示了仅在其中一个模拟中出现的错误折叠状态。最后,我们表明,通过将包含来自不同力场模拟的许多MSMs的数据集聚集为组,并在力场参数选择的背景下量化它们的动力学相似性,可以使用相同的方法来分析该数据集。具有 Jensen-Shannon 散度的最小方差聚类方法提供了一个强大的工具,可通过相似性对模型状态之间以及动力学模型本身之间的动力学进行分组。