Leung Jeremy M G, Frazee Nicolas C, Brace Alexander, Bogetti Anthony T, Ramanathan Arvind, Chong Lillian T
Department of Chemistry, University of Pittsburgh, Pittsburgh, Pennsylvania 15260, United States.
Data Science and Learning Division, Argonne National Laboratory, Lemont, Illinois 60439, United States.
J Chem Theory Comput. 2025 Apr 8;21(7):3691-3699. doi: 10.1021/acs.jctc.4c01136. Epub 2025 Mar 19.
A major challenge for many rare-event sampling strategies is the identification of progress coordinates that capture the slowest relevant motions. Machine-learning methods that can identify progress coordinates in an unsupervised manner have therefore been of great interest to the simulation community. Here, we developed a general method for identifying progress coordinates "on-the-fly" during weighted ensemble (WE) rare-event sampling via deep learning (DL) of outliers among sampled conformations. Our method identifies outliers in a latent space model of the system's sampled conformations that is periodically trained using a convolutional variational autoencoder. As a proof of principle, we applied our DL-enhanced WE method to simulate the NTL9 protein folding process. To enable rapid tests, our simulations propagated discrete-state synthetic molecular dynamics trajectories using a generative, fine-grained Markov state model. Results revealed that our on-the-fly DL of outliers enhanced the efficiency of WE by >3-fold in estimating the folding rate constant. Our efforts are a significant step forward in the unsupervised learning of slow coordinates during rare event sampling.
对于许多稀有事件采样策略而言,一个主要挑战是识别能够捕捉最慢相关运动的进展坐标。因此,能够以无监督方式识别进展坐标的机器学习方法引起了模拟社区的极大兴趣。在此,我们开发了一种通用方法,通过对采样构象中的异常值进行深度学习(DL),在加权系综(WE)稀有事件采样过程中“即时”识别进展坐标。我们的方法在系统采样构象的潜在空间模型中识别异常值,该模型使用卷积变分自动编码器进行定期训练。作为原理验证,我们将深度学习增强的加权系综方法应用于模拟NTL9蛋白折叠过程。为了实现快速测试,我们的模拟使用生成式、细粒度马尔可夫状态模型传播离散状态合成分子动力学轨迹。结果表明,我们对异常值的即时深度学习在估计折叠速率常数方面将加权系综的效率提高了3倍以上。我们的工作在稀有事件采样过程中慢坐标的无监督学习方面向前迈出了重要一步。