Center for Biophysics and Quantitative Biology, University of Illinois at Urbana─Champaign, Urbana, Illinois 61801, United States.
Department of Chemical and Biomolecular Engineering, University of Illinois at Urbana─Champaign, Urbana, Illinois 61801, United States.
J Chem Theory Comput. 2023 Jul 25;19(14):4377-4388. doi: 10.1021/acs.jctc.3c00040. Epub 2023 Apr 7.
Rapid computational exploration of the free energy landscape of biological molecules remains an active area of research due to the difficulty of sampling rare state transitions in molecular dynamics (MD) simulations. In recent years, an increasing number of studies have exploited machine learning (ML) models to enhance and analyze MD simulations. Notably, unsupervised models that extract kinetic information from a set of parallel trajectories have been proposed including the variational approach for Markov processes (VAMP), VAMPNets, and time-lagged variational autoencoders (TVAE). In this work, we propose a combination of adaptive sampling with active learning of kinetic models to accelerate the discovery of the conformational landscape of biomolecules. In particular, we introduce and compare several techniques that combine kinetic models with two adaptive sampling regimes (least counts and multiagent reinforcement learning-based adaptive sampling) to enhance the exploration of conformational ensembles without introducing biasing forces. Moreover, inspired by the active learning approach of uncertainty-based sampling, we also present MaxEnt VAMPNet. This technique consists of restarting simulations from the microstates that maximize the Shannon entropy of a VAMPNet trained to perform the soft discretization of metastable states. By running simulations on two test systems, the WLALL pentapeptide and the villin headpiece subdomain, we empirically demonstrate that MaxEnt VAMPNet results in faster exploration of conformational landscapes compared with the baseline and other proposed methods.
由于在分子动力学 (MD) 模拟中很难采样罕见的状态转变,因此快速计算探索生物分子的自由能景观仍然是一个活跃的研究领域。近年来,越来越多的研究利用机器学习 (ML) 模型来增强和分析 MD 模拟。值得注意的是,已经提出了从一组平行轨迹中提取动力学信息的无监督模型,包括马尔可夫过程的变分方法 (VAMP)、VAMPNets 和时滞变分自动编码器 (TVAE)。在这项工作中,我们提出了将自适应采样与动力学模型的主动学习相结合的方法,以加速生物分子构象景观的发现。特别是,我们介绍并比较了几种将动力学模型与两种自适应采样方案(最小计数和基于多代理强化学习的自适应采样)相结合的技术,以增强构象集合的探索,而不会引入偏置力。此外,受基于不确定性的采样主动学习方法的启发,我们还提出了 MaxEnt VAMPNet。该技术包括从最大程度地增加经过训练以执行亚稳态软离散化的 VAMPNet 的香农熵的微状态重新启动模拟。通过在两个测试系统 WLALL 五肽和 villin 头部结构域上运行模拟,我们从经验上证明,与基线和其他提出的方法相比,MaxEnt VAMPNet 可加快构象景观的探索。