加权系综模拟中进展坐标的无监督学习：应用于NTL9蛋白质折叠

Unsupervised Learning of Progress Coordinates during Weighted Ensemble Simulations: Application to NTL9 Protein Folding.

作者信息

Leung Jeremy M G, Frazee Nicolas C, Brace Alexander, Bogetti Anthony T, Ramanathan Arvind, Chong Lillian T

机构信息

Department of Chemistry, University of Pittsburgh, Pittsburgh, Pennsylvania 15260, United States.

Data Science and Learning Division, Argonne National Laboratory, Lemont, Illinois 60439, United States.

出版信息

J Chem Theory Comput. 2025 Apr 8;21(7):3691-3699. doi: 10.1021/acs.jctc.4c01136. Epub 2025 Mar 19.

DOI:10.1021/acs.jctc.4c01136

PMID:40105797

原文链接:https://pmc.ncbi.nlm.nih.gov/articles/PMC11983707/

Abstract

A major challenge for many rare-event sampling strategies is the identification of progress coordinates that capture the slowest relevant motions. Machine-learning methods that can identify progress coordinates in an unsupervised manner have therefore been of great interest to the simulation community. Here, we developed a general method for identifying progress coordinates "on-the-fly" during weighted ensemble (WE) rare-event sampling via deep learning (DL) of outliers among sampled conformations. Our method identifies outliers in a latent space model of the system's sampled conformations that is periodically trained using a convolutional variational autoencoder. As a proof of principle, we applied our DL-enhanced WE method to simulate the NTL9 protein folding process. To enable rapid tests, our simulations propagated discrete-state synthetic molecular dynamics trajectories using a generative, fine-grained Markov state model. Results revealed that our on-the-fly DL of outliers enhanced the efficiency of WE by >3-fold in estimating the folding rate constant. Our efforts are a significant step forward in the unsupervised learning of slow coordinates during rare event sampling.

摘要

对于许多稀有事件采样策略而言，一个主要挑战是识别能够捕捉最慢相关运动的进展坐标。因此，能够以无监督方式识别进展坐标的机器学习方法引起了模拟社区的极大兴趣。在此，我们开发了一种通用方法，通过对采样构象中的异常值进行深度学习（DL），在加权系综（WE）稀有事件采样过程中“即时”识别进展坐标。我们的方法在系统采样构象的潜在空间模型中识别异常值，该模型使用卷积变分自动编码器进行定期训练。作为原理验证，我们将深度学习增强的加权系综方法应用于模拟NTL9蛋白折叠过程。为了实现快速测试，我们的模拟使用生成式、细粒度马尔可夫状态模型传播离散状态合成分子动力学轨迹。结果表明，我们对异常值的即时深度学习在估计折叠速率常数方面将加权系综的效率提高了3倍以上。我们的工作在稀有事件采样过程中慢坐标的无监督学习方面向前迈出了重要一步。

https://cdn.ncbi.nlm.nih.gov/pmc/blobs/199c/11983707/89ef1777a423/ct4c01136_0001.jpg

相似文献

Unsupervised Learning of Progress Coordinates during Weighted Ensemble Simulations: Application to NTL9 Protein Folding.加权系综模拟中进展坐标的无监督学习：应用于NTL9蛋白质折叠

J Chem Theory Comput. 2025 Apr 8;21(7):3691-3699. doi: 10.1021/acs.jctc.4c01136. Epub 2025 Mar 19.

Unsupervised learning of progress coordinates during weighted ensemble simulations: Application to millisecond protein folding.加权系综模拟中进展坐标的无监督学习：应用于毫秒级蛋白质折叠

bioRxiv. 2024 Aug 30:2024.08.28.610178. doi: 10.1101/2024.08.28.610178.

Rare-Event Sampling using a Reinforcement Learning-Based Weighted Ensemble Method.使用基于强化学习的加权集成方法进行稀有事件采样。

bioRxiv. 2024 Oct 11:2024.10.09.617475. doi: 10.1101/2024.10.09.617475.

Variational embedding of protein folding simulations using Gaussian mixture variational autoencoders.使用高斯混合变分自动编码器对蛋白质折叠模拟进行变分嵌入。

J Chem Phys. 2021 Nov 21;155(19):194108. doi: 10.1063/5.0069708.

Deep clustering of protein folding simulations.蛋白质折叠模拟的深度聚类。

BMC Bioinformatics. 2018 Dec 21;19(Suppl 18):484. doi: 10.1186/s12859-018-2507-5.

Constrained proper sampling of conformations of transition state ensemble of protein folding.约束过渡态折叠蛋白质构象集的适当采样。

J Chem Phys. 2011 Feb 21;134(7):075103. doi: 10.1063/1.3519056.

Molecular simulation of ab initio protein folding for a millisecond folder NTL9(1-39).从头算蛋白质折叠的分子模拟研究 NTL9(1-39)毫秒折叠体。

J Am Chem Soc. 2010 Feb 10;132(5):1526-8. doi: 10.1021/ja9090353.

VAMPnets for deep learning of molecular kinetics.用于分子动力学深度学习的VAMP网络。

Nat Commun. 2018 Jan 2;9(1):5. doi: 10.1038/s41467-017-02388-1.

One Descriptor to Fold Them All: Harnessing Intuition and Machine Learning to Identify Transferable Lasso Peptide Reaction Coordinates.一以贯之：利用直觉和机器学习识别可转移的套索肽反应坐标。

J Phys Chem B. 2024 May 2;128(17):4063-4075. doi: 10.1021/acs.jpcb.3c08492. Epub 2024 Apr 3.

A Deep Learning-Driven Sampling Technique to Explore the Phase Space of an RNA Stem-Loop.一种基于深度学习的 RNA 发夹环相空间采样技术。

J Chem Theory Comput. 2024 Oct 22;20(20):9178-9189. doi: 10.1021/acs.jctc.4c00669. Epub 2024 Oct 7.

引用本文的文献

Lessons Learned from a Ligand-Unbinding Stress Test for Weighted Ensemble Simulations.加权系综模拟中配体解离应力测试的经验教训。

ACS Omega. 2025 Jun 16;10(25):27617-27624. doi: 10.1021/acsomega.5c03809. eCollection 2025 Jul 1.

Rare-Event Sampling using a Reinforcement Learning-Based Weighted Ensemble Method.使用基于强化学习的加权集成方法进行稀有事件采样。

bioRxiv. 2024 Oct 11:2024.10.09.617475. doi: 10.1101/2024.10.09.617475.

本文引用的文献

Augmenting Human Expertise in Weighted Ensemble Simulations through Deep Learning-Based Information Bottleneck.通过基于深度学习的信息瓶颈增强加权集成模拟中的人类专业知识。

J Chem Theory Comput. 2024 Dec 10;20(23):10371-10383. doi: 10.1021/acs.jctc.4c00919. Epub 2024 Nov 26.

AI-driven multiscale simulations illuminate mechanisms of SARS-CoV-2 spike dynamics.人工智能驱动的多尺度模拟揭示了新冠病毒刺突蛋白动态变化的机制。

Int J High Perform Comput Appl. 2021 Sep;35(5):432-451. doi: 10.1177/10943420211006452.

Enhanced Sampling with Machine Learning.利用机器学习的增强采样

Annu Rev Phys Chem. 2024 Jun;75(1):347-370. doi: 10.1146/annurev-physchem-083122-125941. Epub 2024 Jun 14.

Adaptive Sampling Methods for Molecular Dynamics in the Era of Machine Learning.机器学习时代的分子动力学自适应采样方法。

J Phys Chem B. 2023 Dec 21;127(50):10669-10681. doi: 10.1021/acs.jpcb.3c04843. Epub 2023 Dec 11.

Molecular Free Energies, Rates, and Mechanisms from Data-Efficient Path Sampling Simulations.基于数据高效路径采样模拟的分子自由能、速率和机制

J Chem Theory Comput. 2023 Dec 26;19(24):9060-9076. doi: 10.1021/acs.jctc.3c00821. Epub 2023 Nov 21.

AI-Accelerated Design of Targeted Covalent Inhibitors for SARS-CoV-2.用于新型冠状病毒的靶向共价抑制剂的人工智能加速设计

J Chem Inf Model. 2023 Mar 13;63(5):1438-1453. doi: 10.1021/acs.jcim.2c01377. Epub 2023 Feb 21.

Collective variable discovery in the age of machine learning: reality, hype and everything in between.机器学习时代的集体变量发现：现实、炒作及其中的一切。

RSC Adv. 2022 Sep 2;12(38):25010-25024. doi: 10.1039/d2ra03660f. eCollection 2022 Aug 30.

WESTPA 2.0: High-Performance Upgrades for Weighted Ensemble Simulations and Analysis of Longer-Timescale Applications.WESTPA 2.0：加权集成模拟的高性能升级以及更长时间尺度应用的分析。

J Chem Theory Comput. 2022 Feb 8;18(2):638-649. doi: 10.1021/acs.jctc.1c01154. Epub 2022 Jan 19.

High-Throughput Virtual Screening and Validation of a SARS-CoV-2 Main Protease Noncovalent Inhibitor.高通量虚拟筛选和 SARS-CoV-2 主蛋白酶非共价抑制剂的验证。

J Chem Inf Model. 2022 Jan 10;62(1):116-128. doi: 10.1021/acs.jcim.1c00851. Epub 2021 Nov 18.

A glycan gate controls opening of the SARS-CoV-2 spike protein.聚糖门控控制着 SARS-CoV-2 刺突蛋白的开启。

Nat Chem. 2021 Oct;13(10):963-968. doi: 10.1038/s41557-021-00758-3. Epub 2021 Aug 19.

文献检索

告别复杂PubMed语法，用中文像聊天一样搜索，搜遍4000万医学文献。AI智能推荐，让科研检索更轻松。

立即免费搜索

文件翻译

保留排版，准确专业，支持PDF/Word/PPT等文件格式，支持 12+语言互译。

免费翻译文档

深度研究

AI帮你快速写综述，25分钟生成高质量综述，智能提取关键信息，辅助科研写作。

立即免费体验

加权系综模拟中进展坐标的无监督学习：应用于NTL9蛋白质折叠

Unsupervised Learning of Progress Coordinates during Weighted Ensemble Simulations: Application to NTL9 Protein Folding.

作者信息

机构信息

出版信息

相似文献

引用本文的文献

本文引用的文献

文献检索

文件翻译

深度研究

Suppr 超能文献

相似文献

引用本文的文献

本文引用的文献