Department of Biochemistry and Molecular Biophysics, Washington University School of Medicine, 660 South Euclid Avenue, St. Louis, Missouri 63110, USA.
J Chem Phys. 2019 Jan 28;150(4):044108. doi: 10.1063/1.5063794.
Markov state models (MSMs) are quantitative models of protein dynamics that are useful for uncovering the structural fluctuations that proteins undergo, as well as the mechanisms of these conformational changes. Given the enormity of conformational space, there has been ongoing interest in identifying a small number of states that capture the essential features of a protein. Generally, this is achieved by making assumptions about the properties of relevant features-for example, that the most important features are those that change slowly. An alternative strategy is to keep as many degrees of freedom as possible and subsequently learn from the model which of the features are most important. In these larger models, however, traditional approaches quickly become computationally intractable. In this paper, we present enspara, a library for working with MSMs that provides several novel algorithms and specialized data structures that dramatically improve the scalability of traditional MSM methods. This includes ragged arrays for minimizing memory requirements, message passing interface-parallelized implementations of compute-intensive operations, and a flexible framework for model construction and analysis.
马科夫状态模型(MSMs)是蛋白质动力学的定量模型,可用于揭示蛋白质经历的结构波动以及这些构象变化的机制。鉴于构象空间的巨大,人们一直有兴趣确定少数能够捕获蛋白质基本特征的状态。通常,这是通过对相关特征的属性做出假设来实现的-例如,最重要的特征是那些变化缓慢的特征。另一种策略是保留尽可能多的自由度,然后从模型中学习哪些特征是最重要的。然而,在这些更大的模型中,传统方法很快变得计算上难以处理。在本文中,我们介绍了 enspara,这是一个用于处理 MSM 的库,它提供了几个新颖的算法和专门的数据结构,极大地提高了传统 MSM 方法的可扩展性。这包括用于最小化内存需求的参差不齐数组、计算密集型操作的消息传递接口并行实现,以及用于模型构建和分析的灵活框架。