Kleiman Diego E, Feng Jiangyan, Xue Zhengyuan, Shukla Diwakar
Center for Biophysics and Quantitative Biology, University of Illinois at Urbana-Champaign, Urbana, IL, 61801, USA.
Department of Chemical and Biomolecular Engineering, University of Illinois at Urbana-Champaign, Urbana, IL, 61801, USA.
bioRxiv. 2025 Aug 24:2025.08.20.671365. doi: 10.1101/2025.08.20.671365.
Understanding conformational dynamics is essential for elucidating protein function, yet most deep learning models in structural biology predict only static structures. Here, we introduce ESMDynamic, a deep learning model that predicts dynamic residue-residue contact probability maps directly from protein sequence. Built on the ESMFold architecture, ESMDynamic is trained on contact fluctuations from experimental structure ensembles and molecular dynamics (MD) simulations, enabling it to capture diverse modes of structural variability without requiring multiple sequence alignments. We benchmark ESMDynamic on two large-scale MD datasets (mdCATH and ATLAS), showing that it matches or outperforms state-of-the-art ensemble prediction models (AlphaFlow, ESMFlow, BioEmu) for transient contact prediction while offering orders-of-magnitude faster inference. We demonstrate the model on the ASCT2 and SWEET2b transporters, a troponin C design, and the HIV-1 protease homodimer, illustrating generalization to unseen systems and recovery of experimentally validated dynamic contacts. Furthermore, we present an automated pipeline using ESMDynamic predictions to select collective variables for Markov State Model construction, producing high-quality kinetic models from unbiased MD simulations of SWEET2b. Overall, ESMDynamic provides a compact and interpretable sequence-based description of conformational dynamics, with broad applications in protein engineering, functional analysis, and simulation-guided discovery.
理解构象动力学对于阐明蛋白质功能至关重要,但结构生物学中的大多深度学习模型仅预测静态结构。在此,我们介绍了ESMDynamic,这是一种深度学习模型,可直接从蛋白质序列预测动态残基-残基接触概率图。基于ESMFold架构构建,ESMDynamic在来自实验结构集合和分子动力学(MD)模拟的接触波动上进行训练,使其能够捕获多种结构变异性模式,而无需多序列比对。我们在两个大规模MD数据集(mdCATH和ATLAS)上对ESMDynamic进行基准测试,结果表明,在瞬态接触预测方面,它与最先进的集合预测模型(AlphaFlow、ESMFlow、BioEmu)相当或更优,同时推理速度快几个数量级。我们在ASCT2和SWEET2b转运蛋白、肌钙蛋白C设计以及HIV-1蛋白酶同二聚体上展示了该模型,说明了它对未见系统的泛化能力以及对经实验验证的动态接触的恢复能力。此外,我们提出了一个自动化流程,利用ESMDynamic预测来选择用于马尔可夫状态模型构建的集体变量,从SWEET2b的无偏MD模拟中生成高质量的动力学模型。总体而言,ESMDynamic提供了一种基于序列的紧凑且可解释的构象动力学描述,在蛋白质工程、功能分析和模拟引导发现中具有广泛应用。