College of Life Sciences, Zhejiang University, Hangzhou 310027, China.
School of Mathematical Sciences, Institute of Natural Sciences, and MOE-LSC, Shanghai Jiao Tong University, Shanghai 200240, China.
J Chem Theory Comput. 2024 Oct 8;20(19):8569-8582. doi: 10.1021/acs.jctc.4c00764. Epub 2024 Sep 20.
Biomolecular simulations often suffer from the "time scale problem", hindering the study of rare events occurring over extended time scales. Enhanced sampling techniques aim to alleviate this issue by accelerating conformational transitions, yet they typically necessitate well-defined collective variables (CVs), posing a significant challenge. Machine learning offers promising solutions but typically requires rich training data encompassing the entire free energy surface (FES). In this work, we introduce an automated iterative pipeline designed to mitigate these limitations. Our protocol first utilizes a CV-free count-based adaptive sampling method to generate a data set rich in rare events. From this data set, slow modes are identified using Koopman-reweighted time-lagged independent component analysis (KTICA), which are subsequently leveraged by on-the-fly probability enhanced sampling (OPES) to efficiently explore the FES. The effectiveness of our pipeline is demonstrated and further compared with the common Markov State Model (MSM) approach on two model systems with increasing complexity: alanine dipeptide (Ala2) and deca-alanine (Ala10), underscoring its applicability across diverse biomolecular simulations.
生物分子模拟常常受到“时间尺度问题”的困扰,难以研究发生在长时间尺度上的罕见事件。增强采样技术旨在通过加速构象转变来缓解这一问题,但它们通常需要定义明确的集体变量 (CV),这带来了重大挑战。机器学习提供了有前途的解决方案,但通常需要包含整个自由能面 (FES) 的丰富训练数据。在这项工作中,我们引入了一种自动化迭代管道,旨在减轻这些限制。我们的方案首先利用无 CV 基于计数的自适应采样方法生成富含罕见事件的数据。从这个数据集,我们使用 Koopman 重加权时滞独立成分分析 (KTICA) 识别慢模态,然后利用即时概率增强采样 (OPES) 来有效地探索 FES。我们的方案在两个具有不同复杂程度的模型系统上进行了演示和进一步比较,包括丙氨酸二肽 (Ala2) 和 deca-丙氨酸 (Ala10),突出了其在各种生物分子模拟中的适用性。