William A. Brookshire Department of Chemical and Biomolecular Engineering, University of Houston, Houston, Texas 77204, United States.
Data Science and Learning Division, Argonne National Laboratory, Lemont, Illinois 60439, United States.
J Chem Theory Comput. 2024 Oct 22;20(20):9178-9189. doi: 10.1021/acs.jctc.4c00669. Epub 2024 Oct 7.
The folding and unfolding of RNA stem-loops are critical biological processes; however, their computational studies are often hampered by the ruggedness of their folding landscape, necessitating long simulation times at the atomistic scale. Here, we adapted DeepDriveMD (DDMD), an advanced deep learning-driven sampling technique originally developed for protein folding, to address the challenges of RNA stem-loop folding. Although tempering- and order parameter-based techniques are commonly used for similar rare-event problems, the computational costs or the need for a priori knowledge about the system often present a challenge in their effective use. DDMD overcomes these challenges by adaptively learning from an ensemble of running MD simulations using generic contact maps as the raw input. DeepDriveMD enables on-the-fly learning of a low-dimensional latent representation and guides the simulation toward the undersampled regions while optimizing the resources to explore the relevant parts of the phase space. We showed that DDMD estimates the free energy landscape of the RNA stem-loop reasonably well at room temperature. Our simulation framework runs at a constant temperature without external biasing potential, hence preserving the information on transition rates, with a computational cost much lower than that of the simulations performed with external biasing potentials. We also introduced a reweighting strategy for obtaining unbiased free energy surfaces and presented a qualitative analysis of the latent space. This analysis showed that the latent space captures the relevant slow degrees of freedom for the RNA folding problem of interest. Finally, throughout the manuscript, we outlined how different parameters are selected and optimized to adapt DDMD for this system. We believe this compendium of decision-making processes will help new users adapt this technique for the rare-event sampling problems of their interest.
RNA 发夹环的折叠和展开是关键的生物过程;然而,由于其折叠景观的崎岖不平,它们的计算研究往往受到阻碍,需要在原子尺度上进行长时间的模拟。在这里,我们改编了 DeepDriveMD(DDMD),这是一种最初为蛋白质折叠开发的先进深度学习驱动的采样技术,用于解决 RNA 发夹环折叠的挑战。尽管调温和序参数技术常用于类似的罕见事件问题,但在有效使用这些技术时,计算成本或对系统的先验知识的需求往往是一个挑战。DDMD 通过使用通用接触图作为原始输入,从一组运行的 MD 模拟中自适应地学习来克服这些挑战。DeepDriveMD 能够实时学习低维潜在表示,并在优化资源以探索相空间的相关部分的同时,引导模拟进入欠采样区域。我们表明,DDMD 在室温下能够很好地估计 RNA 发夹环的自由能景观。我们的模拟框架在没有外部偏置势的情况下在恒定温度下运行,因此保留了关于跃迁率的信息,计算成本远低于使用外部偏置势进行的模拟。我们还引入了一种重新加权策略来获得无偏的自由能表面,并对潜在空间进行了定性分析。该分析表明,潜在空间捕获了感兴趣的 RNA 折叠问题的相关慢自由度。最后,在整个手稿中,我们概述了如何选择和优化不同的参数,以使 DDMD 适应这个系统。我们相信,这个决策过程的汇编将帮助新用户将该技术应用于他们感兴趣的罕见事件采样问题。