Bose Samik, Kilinc Ceren, Dickson Alex
Department of Biochemistry and Molecular Biology, Michigan State University, East Lansing, Michigan 48824, United States.
Department of Computational Mathematics, Science and Engineering, Michigan State University, East Lansing, Michigan 48824, United States.
J Chem Theory Comput. 2025 Feb 25;21(4):1805-1816. doi: 10.1021/acs.jctc.4c01141. Epub 2025 Feb 11.
The weighted ensemble (WE) algorithm is gaining popularity as a rare event method for studying long timescale processes with molecular dynamics. WE is particularly useful for determining kinetic properties, such as rates of protein (un)folding and ligand (un)binding, where transition rates can be calculated from the flux of trajectories into a target basin of interest. However, this flux depends exponentially on the number of splitting events that a given trajectory experiences before reaching the target state and can vary by orders of magnitude between WE replicates. Markov state models (MSMs) are helpful tools to aggregate information across multiple WE simulations and have previously been shown to provide more accurate transition rates than WE alone. Discrete-time MSMs are models that coarsely describe the evolution of the system from one discrete state to the next using a discrete lag time, τ. When an MSM is built using conventional MD data, longer values of τ typically provide more accurate results. Combining WE simulations with Markov state modeling presents some additional challenges, especially when using a value of τ that exceeds the lag time between resampling steps in the WE algorithm, τ. Here, we identify a source of bias that occurs when τ > τ, which we refer to as "merging bias". We also propose an algorithm to eliminate the merging bias, which results in merging bias-corrected MSMs, or "MBC-MSMs". Using a simple model system, as well as a complex biomolecular example, we show that MBC-MSMs significantly outperform both τ = τ MSMs and uncorrected MSMs at longer lag times.
加权系综(WE)算法作为一种用于通过分子动力学研究长时间尺度过程的罕见事件方法正日益受到关注。WE对于确定动力学性质特别有用,例如蛋白质(去)折叠和配体(去)结合的速率,其中跃迁速率可以从进入目标感兴趣盆地的轨迹通量计算得出。然而,这种通量指数地依赖于给定轨迹在到达目标状态之前经历的分裂事件的数量,并且在WE重复之间可能相差几个数量级。马尔可夫状态模型(MSM)是跨多个WE模拟聚合信息的有用工具,并且先前已表明其比单独的WE能提供更准确的跃迁速率。离散时间MSM是使用离散滞后时间τ粗略描述系统从一个离散状态到下一个离散状态演化的模型。当使用传统MD数据构建MSM时,较长的τ值通常会提供更准确的结果。将WE模拟与马尔可夫状态建模相结合会带来一些额外的挑战,特别是当使用超过WE算法中重采样步骤之间滞后时间τ的值时。在这里,我们识别出当τ>τ时出现的一种偏差源,我们将其称为“合并偏差”。我们还提出了一种消除合并偏差的算法,其结果是得到合并偏差校正的MSM,即“MBC-MSM”。使用一个简单的模型系统以及一个复杂的生物分子实例,我们表明在较长滞后时间下,MBC-MSM显著优于τ = τ的MSM和未校正的MSM。