Physical and Life Sciences Directorate, Lawrence Livermore National Laboratory, Livermore 94550, California, United States.
Center for Applied Mathematics, Cornell University, Ithaca 14853, New York, United States.
J Chem Theory Comput. 2024 Oct 22;20(20):8795-8806. doi: 10.1021/acs.jctc.4c00816. Epub 2024 Oct 10.
Resolving the intricate details of biological phenomena at the molecular level is fundamentally limited by both length- and time scales that can be probed experimentally. Molecular dynamics (MD) simulations at various scales are powerful tools frequently employed to offer valuable biological insights beyond experimental resolution. However, while it is relatively simple to observe long-lived, stable configurations of, for example, proteins, at the required spatial resolution, simulating the more interesting rare transitions between such states often takes orders of magnitude longer than what is feasible even on the largest supercomputers available today. One common aspect of this challenge is pathway discovery, where the start and end states of a scientific phenomenon are known or can be approximated, but the mechanistic details in between are unknown. Here, we propose a representation-learning-based solution that uses interpolation and extrapolation in an abstract representation space to synthesize potential transition states, which are automatically validated using MD simulations. The new simulations of the synthesized transition states are subsequently incorporated into the representation learning, leading to an iterative framework for targeted path sampling. Our approach is demonstrated by recovering the transition of a RAS-RAF protein domain (CRD) from membrane-free to interacting with the membrane using coarse-grain MD simulations.
解析分子水平上复杂的生物学现象受到实验可探测的长度和时间尺度的根本限制。在各种尺度上进行分子动力学 (MD) 模拟是一种强大的工具,常用于提供超越实验分辨率的有价值的生物学见解。然而,虽然在所需的空间分辨率下观察例如蛋白质的长寿命、稳定构象相对简单,但模拟这些状态之间更有趣的罕见转变通常需要比当今最大的超级计算机上可行的时间长几个数量级。这个挑战的一个常见方面是途径发现,其中科学现象的起始和结束状态是已知的或可以近似的,但中间的机制细节是未知的。在这里,我们提出了一种基于表示学习的解决方案,该方案使用抽象表示空间中的插值和外推来合成潜在的过渡态,然后使用 MD 模拟自动验证这些过渡态。随后将合成过渡态的新模拟纳入表示学习中,从而形成了针对目标路径采样的迭代框架。我们的方法通过使用粗粒度 MD 模拟从无膜到与膜相互作用的 RAS-RAF 蛋白结构域 (CRD) 的转变得到了验证。