Departments of Chemistry and Molecular and Cell Biology, University of California, Berkeley, California 94720, USA.
J Chem Phys. 2013 Sep 28;139(12):121905. doi: 10.1063/1.4812768.
Markov models and master equations are a powerful means of modeling dynamic processes like protein conformational changes. However, these models are often difficult to understand because of the enormous number of components and connections between them. Therefore, a variety of methods have been developed to facilitate understanding by coarse-graining these complex models. Here, we employ Bayesian model comparison to determine which of these coarse-graining methods provides the models that are most faithful to the original set of states. We find that the Bayesian agglomerative clustering engine and the hierarchical Nyström expansion graph (HNEG) typically provide the best performance. Surprisingly, the original Perron cluster cluster analysis (PCCA) method often provides the next best results, outperforming the newer PCCA+ method and the most probable paths algorithm. We also show that the differences between the models are qualitatively significant, rather than being minor shifts in the boundaries between states. The performance of the methods correlates well with the entropy of the resulting coarse-grainings, suggesting that finding states with more similar populations (i.e., avoiding low population states that may just be noise) gives better results.
马尔可夫模型和主方程是建模蛋白质构象变化等动态过程的有力手段。然而,由于这些模型中的组件和它们之间的连接数量巨大,因此通常难以理解。因此,已经开发了各种方法来通过对这些复杂模型进行粗粒化来促进理解。在这里,我们采用贝叶斯模型比较来确定这些粗粒化方法中哪一种提供了与原始状态集最忠实的模型。我们发现贝叶斯凝聚聚类引擎和分层 Nyström 扩展图(HNEG)通常提供最佳性能。令人惊讶的是,原始的 Perron 聚类分析(PCCA)方法通常提供下一个最佳结果,优于较新的 PCCA+方法和最可能路径算法。我们还表明,模型之间的差异在质上是显著的,而不是状态之间边界的微小变化。方法的性能与粗粒化的熵密切相关,这表明找到具有更相似群体的状态(即避免可能只是噪声的低群体状态)会产生更好的结果。