Liu Bojun, Boysen Jordan G, Unarta Ilona Christy, Du Xuefeng, Li Yixuan, Huang Xuhui
Department of Chemistry, Theoretical Chemistry Institute, University of Wisconsin-Madison, Madison, WI, 53706, USA.
Data Science Institute, University of Wisconsin-Madison, Madison, WI, 53706, USA.
Nat Commun. 2025 Jan 2;16(1):349. doi: 10.1038/s41467-024-55228-4.
Identifying transitional states is crucial for understanding protein conformational changes that underlie numerous biological processes. Markov state models (MSMs), built from Molecular Dynamics (MD) simulations, capture these dynamics through transitions among metastable conformational states, and have demonstrated success in studying protein conformational changes. However, MSMs face challenges in identifying transition states, as they partition MD conformations into discrete metastable states (or free energy minima), lacking description of transition states located at the free energy barriers. Here, we introduce Transition State identification via Dispersion and vAriational principle Regularized neural networks (TS-DAR), a deep learning framework inspired by out-of-distribution (OOD) detection in trustworthy artificial intelligence (AI). TS-DAR offers an end-to-end pipeline that can simultaneously detect all transition states between multiple free minima from MD simulations using the regularized hyperspherical embeddings in latent space. The key insight of TS-DAR lies in treating transition state structures as OOD data, recognizing that they are sparsely populated and exhibit a distributional shift from metastable states. We demonstrate the power of TS-DAR by applying it to a 2D potential, alanine dipeptide, and the translocation of a DNA motor protein on DNA, where it outperforms previous methods in identifying transition states.
识别过渡态对于理解众多生物过程背后的蛋白质构象变化至关重要。基于分子动力学(MD)模拟构建的马尔可夫状态模型(MSM),通过亚稳态构象状态之间的转变来捕捉这些动力学,并已在研究蛋白质构象变化方面取得成功。然而,MSM在识别过渡态时面临挑战,因为它们将MD构象划分为离散的亚稳态(或自由能最小值),缺乏对位于自由能障碍处的过渡态的描述。在这里,我们引入了通过色散和变分原理正则化神经网络(TS-DAR)进行过渡态识别,这是一个受可信人工智能(AI)中的分布外(OOD)检测启发的深度学习框架。TS-DAR提供了一个端到端的管道,该管道可以使用潜在空间中的正则化超球面嵌入同时检测MD模拟中多个自由最小值之间的所有过渡态。TS-DAR的关键见解在于将过渡态结构视为OOD数据,认识到它们分布稀疏且与亚稳态存在分布偏移。我们将TS-DAR应用于二维势、丙氨酸二肽以及DNA运动蛋白在DNA上的易位,展示了它在识别过渡态方面的能力,其性能优于先前的方法。