Wang Dedi, Tiwary Pratyush
Biophysics Program and Institute for Physical Science and Technology, University of Maryland, College Park 20742, USA.
Department of Chemistry and Biochemistry and Institute for Physical Science and Technology, University of Maryland, College Park 20742, USA.
ArXiv. 2024 Nov 15:arXiv:2406.14839v2.
The weighted ensemble (WE) method stands out as a widely used segment-based sampling technique renowned for its rigorous treatment of kinetics. The WE framework typically involves initially mapping the configuration space onto a low-dimensional collective variable (CV) space and then partitioning it into bins. The efficacy of WE simulations heavily depends on the selection of CVs and binning schemes. The recently proposed State Predictive Information Bottleneck (SPIB) method has emerged as a promising tool for automatically constructing CVs from data and guiding enhanced sampling through an iterative manner. In this work, we advance this data-driven pipeline by incorporating prior expert knowledge. Our hybrid approach combines SPIB-learned CVs to enhance sampling in explored regions with expert-based CVs to guide exploration in regions of interest, synergizing the strengths of both methods. Through benchmarking on alanine dipeptide and chignoin systems, we demonstrate that our hybrid approach effectively guides WE simulations to sample states of interest, and reduces run-to-run variances. Moreover, our integration of the SPIB model also enhances the analysis and interpretation of WE simulation data by effectively identifying metastable states and pathways, and offering direct visualization of dynamics.
加权系综(WE)方法是一种广泛使用的基于片段的采样技术,以其对动力学的严格处理而闻名。WE框架通常包括首先将构型空间映射到低维集体变量(CV)空间,然后将其划分为多个箱。WE模拟的有效性在很大程度上取决于CV和分箱方案的选择。最近提出的状态预测信息瓶颈(SPIB)方法已成为一种有前途的工具,可用于从数据中自动构建CV,并通过迭代方式指导增强采样。在这项工作中,我们通过纳入先验专家知识来推进这种数据驱动的流程。我们的混合方法将SPIB学习到的CV结合起来,以增强在已探索区域的采样,并与基于专家的CV相结合,以指导在感兴趣区域的探索,从而发挥两种方法的优势。通过对丙氨酸二肽和chignoin系统的基准测试,我们证明了我们的混合方法有效地指导了WE模拟以对感兴趣的状态进行采样,并减少了运行间的方差。此外,我们对SPIB模型的整合还通过有效地识别亚稳态和路径,并提供动力学的直接可视化,增强了对WE模拟数据的分析和解释。