Biophysics Program and Institute for Physical Science and Technology, University of Maryland, College Park, Maryland 20742, United States.
Department of Chemistry, Theoretical Chemistry Institute, University of Wisconsin-Madison, Madison, Wisconsin 53706, United States.
J Chem Theory Comput. 2024 Jun 25;20(12):5352-5367. doi: 10.1021/acs.jctc.4c00449. Epub 2024 Jun 10.
Markov state models (MSMs) have proven valuable in studying the dynamics of protein conformational changes via statistical analysis of molecular dynamics simulations. In MSMs, the complex configuration space is coarse-grained into conformational states, with dynamics modeled by a series of Markovian transitions among these states at discrete lag times. Constructing the Markovian model at a specific lag time necessitates defining states that circumvent significant internal energy barriers, enabling internal dynamics relaxation within the lag time. This process effectively coarse-grains time and space, integrating out rapid motions within metastable states. Thus, MSMs possess a multiresolution nature, where the granularity of states can be adjusted according to the time-resolution, offering flexibility in capturing system dynamics. This work introduces a continuous embedding approach for molecular conformations using the state predictive information bottleneck (SPIB), a framework that unifies dimensionality reduction and state space partitioning via a continuous, machine learned basis set. Without explicit optimization of the VAMP-based scores, SPIB demonstrates state-of-the-art performance in identifying slow dynamical processes and constructing predictive multiresolution Markovian models. Through applications to well-validated mini-proteins, SPIB showcases unique advantages compared to competing methods. It autonomously and self-consistently adjusts the number of metastable states based on a specified minimal time resolution, eliminating the need for manual tuning. While maintaining efficacy in dynamical properties, SPIB excels in accurately distinguishing metastable states and capturing numerous well-populated macrostates. This contrasts with existing VAMP-based methods, which often emphasize slow dynamics at the expense of incorporating numerous sparsely populated states. Furthermore, SPIB's ability to learn a low-dimensional continuous embedding of the underlying MSMs enhances the interpretation of dynamic pathways. With these benefits, we propose SPIB as an easy-to-implement methodology for end-to-end MSM construction.
马科夫状态模型 (MSM) 通过对分子动力学模拟的统计分析,已被证明在研究蛋白质构象变化动力学方面具有重要价值。在 MSM 中,复杂的构型空间被粗粒化为构象状态,动力学通过在这些状态之间的一系列马尔可夫跃迁来建模,在离散的滞后时间。在特定的滞后时间构建马尔可夫模型需要定义避免显著内能垒的状态,从而使滞后时间内的内部动力学松弛。这个过程有效地将时间和空间粗粒化,整合了亚稳态中的快速运动。因此,MSM 具有多分辨率的性质,可以根据时间分辨率调整状态的粒度,从而灵活地捕获系统动力学。这项工作介绍了一种使用状态预测信息瓶颈 (SPIB) 的分子构象连续嵌入方法,该框架通过连续的、机器学习的基集统一了降维和状态空间分区。无需对基于 VAMP 的分数进行显式优化,SPIB 在识别慢动力学过程和构建预测多分辨率马科夫模型方面表现出了最先进的性能。通过应用于经过充分验证的小型蛋白质,SPIB 与竞争方法相比展示了独特的优势。它可以自动和自洽地根据指定的最小时间分辨率调整亚稳态的数量,无需手动调整。在保持动力学性质有效性的同时,SPIB 在准确区分亚稳态和捕获大量高人口的宏态方面表现出色。这与现有的基于 VAMP 的方法形成对比,后者通常强调慢动力学,而牺牲了包含大量人口稀疏的状态。此外,SPIB 学习潜在 MSM 的低维连续嵌入的能力增强了对动态路径的解释。基于这些优点,我们提出 SPIB 作为一种易于实现的端到端 MSM 构建方法。