Jones Michael S, Khanna Smayan, Ferguson Andrew L
Pritzker School of Molecular Engineering, University of Chicago, Chicago, Illinois 60637, United States.
Department of Chemistry, University of Chicago, Chicago, Illinois 60637, United States.
J Chem Inf Model. 2025 Jan 27;65(2):672-692. doi: 10.1021/acs.jcim.4c02046. Epub 2025 Jan 7.
Coarse-grained models have become ubiquitous in biomolecular modeling tasks aimed at studying slow dynamical processes such as protein folding and DNA hybridization. These models can considerably accelerate sampling but it remains challenging to accurately and efficiently restore all-atom detail to the coarse-grained trajectory, which can be vital for detailed understanding of molecular mechanisms and calculation of observables contingent on all-atom coordinates. In this work, we introduce FlowBack as a deep generative model employing a flow-matching objective to map samples from a coarse-grained prior distribution to an all-atom data distribution. We construct our prior distribution to be agnostic to the coarse-grained map and molecular type. A protein-specific model trained on ∼65k structures from the Protein Data Bank achieves state-of-the-art performance on structural metrics compared to previous generative and rules-based approaches in applications to static PDB structures, all-atom simulations of fast-folding proteins, and coarse-grained trajectories generated by a machine-learned force field. A DNA-protein model trained on ∼1.5k DNA-protein complexes achieves excellent reconstruction and generative capabilities on static DNA-protein complexes from the Protein Data Bank as well as on out-of-distribution coarse-grained dynamical simulations of DNA-protein complexation. FlowBack offers an accurate, efficient, and easy-to-use tool to recover all-atom structures from coarse-grained molecular simulations with higher robustness and fewer steric clashes than previous approaches. We make FlowBack freely available to the community as an open source Python package.
粗粒度模型在旨在研究诸如蛋白质折叠和DNA杂交等缓慢动力学过程的生物分子建模任务中已变得无处不在。这些模型可以显著加速采样,但要准确有效地将全原子细节恢复到粗粒度轨迹仍然具有挑战性,而这对于详细理解分子机制以及计算依赖于全原子坐标的可观测量至关重要。在这项工作中,我们引入了FlowBack作为一种深度生成模型,它采用流匹配目标将来自粗粒度先验分布的样本映射到全原子数据分布。我们构建的先验分布与粗粒度映射和分子类型无关。在来自蛋白质数据库的约65k个结构上训练的特定蛋白质模型,在应用于静态PDB结构、快速折叠蛋白质的全原子模拟以及由机器学习力场生成的粗粒度轨迹时,与先前基于生成和规则的方法相比,在结构指标上实现了领先的性能。在约1.5k个DNA-蛋白质复合物上训练的DNA-蛋白质模型,在来自蛋白质数据库的静态DNA-蛋白质复合物以及DNA-蛋白质复合的分布外粗粒度动力学模拟中,实现了出色的重建和生成能力。FlowBack提供了一种准确、高效且易于使用的工具,用于从粗粒度分子模拟中恢复全原子结构,与先前的方法相比,具有更高的稳健性和更少的空间冲突。我们将FlowBack作为一个开源Python包免费提供给社区。