Suppr超能文献

生成扩散模型在跨尺度和全原子分辨率下对蛋白质构象进行增强采样的效果如何?

How good is generative diffusion model for enhanced sampling of protein conformations across scales and in all-atom resolution?

作者信息

Bera Palash, Mondal Jagannath

机构信息

Tata Institute of Fundamental Research Hyderabad, Hyderabad, Telangana 500046, India.

出版信息

J Chem Phys. 2025 Sep 21;163(11). doi: 10.1063/5.0279756.

Abstract

Molecular dynamics (MD) simulations are fundamental for probing the structural dynamics of biomolecules, yet their efficiency is limited by the high computational cost of exploring long-timescale events. Generative machine learning (ML) models, particularly the Denoising Diffusion Probabilistic Model (DDPM), offer an emerging strategy to enhance conformational sampling. In this study, we evaluate the capabilities and limitations of DDPM in generating atomistically accurate conformational ensembles across proteins of varying size and structural order, ranging from the 20-residue folded Trp-cage and 58-residue BPTI to the 83-residue intrinsically disordered region Ash1 and the 140-residue intrinsically disordered protein α-Synuclein. Training DDPM on relatively short MD trajectories using both torsion angle and all-atom coordinate data, we demonstrate that it can reproduce key structural features such as secondary structure, radius of gyration, and contact maps, while effectively sampling sparsely populated regions of the conformational landscape. Notably, DDPM can also generate novel conformations, including transitions not explicitly observed in the training data. However, the model occasionally overlooks low-probability regions and may produce conformers with unclear physical relevance, warranting independent validation. These limitations are particularly evident in flexible systems such as IDPs. Overall, this work benchmarks DDPM as a viable tool for augmenting MD simulations, offering enhanced sampling with significant computational savings, while noting its limitations in capturing low-populated conformers. At the same time, it highlights the importance of rigorous validation and thoughtful interpretation when deploying generative models in computational biophysics.

摘要

分子动力学(MD)模拟是探究生物分子结构动力学的基础,但由于探索长时间尺度事件的计算成本高昂,其效率受到限制。生成式机器学习(ML)模型,特别是去噪扩散概率模型(DDPM),提供了一种增强构象采样的新兴策略。在本研究中,我们评估了DDPM在生成跨越不同大小和结构顺序蛋白质的原子精确构象集合方面的能力和局限性,这些蛋白质包括20个残基的折叠型色氨酸笼和58个残基的BPTI,以及83个残基的内在无序区域Ash1和140个残基的内在无序蛋白α-突触核蛋白。使用扭转角和全原子坐标数据在相对较短的MD轨迹上训练DDPM,我们证明它可以重现关键的结构特征,如二级结构、回转半径和接触图,同时有效地采样构象景观中稀疏分布的区域。值得注意的是,DDPM还可以生成新颖的构象,包括在训练数据中未明确观察到的转变。然而,该模型偶尔会忽略低概率区域,并且可能会产生物理相关性不明确的构象,需要进行独立验证。这些局限性在诸如内在无序蛋白(IDP)等柔性系统中尤为明显。总体而言,这项工作将DDPM作为增强MD模拟的可行工具进行了基准测试,它在显著节省计算量的情况下提供了增强的采样,同时指出了其在捕获低丰度构象方面的局限性。与此同时,它强调了在计算生物物理学中部署生成模型时进行严格验证和深思熟虑解释的重要性。

文献检索

告别复杂PubMed语法,用中文像聊天一样搜索,搜遍4000万医学文献。AI智能推荐,让科研检索更轻松。

立即免费搜索

文件翻译

保留排版,准确专业,支持PDF/Word/PPT等文件格式,支持 12+语言互译。

免费翻译文档

深度研究

AI帮你快速写综述,25分钟生成高质量综述,智能提取关键信息,辅助科研写作。

立即免费体验