蛋白质折叠模拟的深度聚类。

Deep clustering of protein folding simulations.

机构信息

Computational Science and Engineering Division, Oak Ridge National Laboratory, One Bethel Valley Road, MS6085, Oak Ridge, TN, USA.

出版信息

BMC Bioinformatics. 2018 Dec 21;19(Suppl 18):484. doi: 10.1186/s12859-018-2507-5.

DOI:10.1186/s12859-018-2507-5

PMID:30577777

原文链接:https://pmc.ncbi.nlm.nih.gov/articles/PMC6302667/

Abstract

BACKGROUND

We examine the problem of clustering biomolecular simulations using deep learning techniques. Since biomolecular simulation datasets are inherently high dimensional, it is often necessary to build low dimensional representations that can be used to extract quantitative insights into the atomistic mechanisms that underlie complex biological processes.

RESULTS

We use a convolutional variational autoencoder (CVAE) to learn low dimensional, biophysically relevant latent features from long time-scale protein folding simulations in an unsupervised manner. We demonstrate our approach on three model protein folding systems, namely Fs-peptide (14 μs aggregate sampling), villin head piece (single trajectory of 125 μs) and β- β- α (BBA) protein (223 + 102 μs sampling across two independent trajectories). In these systems, we show that the CVAE latent features learned correspond to distinct conformational substates along the protein folding pathways. The CVAE model predicts, on average, nearly 89% of all contacts within the folding trajectories correctly, while being able to extract folded, unfolded and potentially misfolded states in an unsupervised manner. Further, the CVAE model can be used to learn latent features of protein folding that can be applied to other independent trajectories, making it particularly attractive for identifying intrinsic features that correspond to conformational substates that share similar structural features.

CONCLUSIONS

Together, we show that the CVAE model can quantitatively describe complex biophysical processes such as protein folding.

摘要

背景

我们研究了使用深度学习技术对生物分子模拟进行聚类的问题。由于生物分子模拟数据集本质上是高维的，因此通常需要构建低维表示，以便从原子机制中提取定量见解，这些机制是复杂生物过程的基础。

结果

我们使用卷积变分自动编码器 (CVAE) 以无监督的方式从长时间尺度的蛋白质折叠模拟中学习低维、具有生物物理意义的潜在特征。我们在三个模型蛋白质折叠系统上证明了我们的方法，即 Fs-肽（14 μs 聚集采样）、绒毛蛋白头部片段（125 μs 的单个轨迹）和 β-β-α（BBA）蛋白（跨越两条独立轨迹的 223 + 102 μs 采样）。在这些系统中，我们表明，CVAE 学习到的潜在特征对应于蛋白质折叠途径中的不同构象亚状态。CVAE 模型平均预测折叠轨迹内近 89%的所有接触正确，同时能够以无监督的方式提取折叠、未折叠和潜在错误折叠状态。此外，CVAE 模型可用于学习可应用于其他独立轨迹的蛋白质折叠的潜在特征，使其特别适合识别与具有相似结构特征的构象亚状态相对应的内在特征。

结论

总的来说，我们表明 CVAE 模型可以定量描述蛋白质折叠等复杂的生物物理过程。

https://cdn.ncbi.nlm.nih.gov/pmc/blobs/a9ad/6302667/0b39e0e6e6d5/12859_2018_2507_Fig1_HTML.jpg

相似文献

Deep clustering of protein folding simulations.

BMC Bioinformatics. 2018 Dec 21;19(Suppl 18):484. doi: 10.1186/s12859-018-2507-5.

Variational embedding of protein folding simulations using Gaussian mixture variational autoencoders.

J Chem Phys. 2021 Nov 21;155(19):194108. doi: 10.1063/5.0069708.

Unsupervised feature learning for electrocardiogram data using the convolutional variational autoencoder.

PLoS One. 2021 Dec 1;16(12):e0260612. doi: 10.1371/journal.pone.0260612. eCollection 2021.

Unsupervised Learning of Progress Coordinates during Weighted Ensemble Simulations: Application to NTL9 Protein Folding.

J Chem Theory Comput. 2025 Apr 8;21(7):3691-3699. doi: 10.1021/acs.jctc.4c01136. Epub 2025 Mar 19.

Toward a Benchmark for Markov State Models: The Folding of HP35.

J Phys Chem Lett. 2023 Aug 10;14(31):6956-6967. doi: 10.1021/acs.jpclett.3c01561. Epub 2023 Jul 28.

LAST: Latent Space-Assisted Adaptive Sampling for Protein Trajectories.

J Chem Inf Model. 2023 Jan 9;63(1):67-75. doi: 10.1021/acs.jcim.2c01213. Epub 2022 Dec 6.

Structured pathway across the transition state for peptide folding revealed by molecular dynamics simulations.

PLoS Comput Biol. 2011 Sep;7(9):e1002137. doi: 10.1371/journal.pcbi.1002137. Epub 2011 Sep 8.

Mapping conformational landscape in protein folding: Benchmarking dimensionality reduction and clustering techniques on the Trp-Cage mini-protein.

Biophys Chem. 2025 Apr;319:107389. doi: 10.1016/j.bpc.2025.107389. Epub 2025 Jan 17.

Dynamics of protein folding: probing the kinetic network of folding-unfolding transitions with experiment and theory.

Biochim Biophys Acta. 2011 Aug;1814(8):1001-20. doi: 10.1016/j.bbapap.2010.09.013. Epub 2010 Sep 29.

Simulation of folding of a small alpha-helical protein in atomistic detail using worldwide-distributed computing.

J Mol Biol. 2002 Nov 8;323(5):927-37. doi: 10.1016/s0022-2836(02)00997-x.

引用本文的文献

Structural Plasticity and Functional Dynamics of Pigeon Cryptochrome 4 as Avian Magnetoreceptor.

J Mol Biol. 2025 May 27:169233. doi: 10.1016/j.jmb.2025.169233.

Evolutionary Dynamics and Functional Differences in Clinically Relevant Pen β-Lactamases from spp.

J Chem Inf Model. 2025 May 26;65(10):5086-5098. doi: 10.1021/acs.jcim.5c00271. Epub 2025 May 2.

A beginner's approach to deep learning applied to VS and MD techniques.

J Cheminform. 2025 Apr 8;17(1):47. doi: 10.1186/s13321-025-00985-7.

Unsupervised Learning of Progress Coordinates during Weighted Ensemble Simulations: Application to NTL9 Protein Folding.

J Chem Theory Comput. 2025 Apr 8;21(7):3691-3699. doi: 10.1021/acs.jctc.4c01136. Epub 2025 Mar 19.

Generating Protein Structures for Pathway Discovery Using Deep Learning.

J Chem Theory Comput. 2024 Oct 22;20(20):8795-8806. doi: 10.1021/acs.jctc.4c00816. Epub 2024 Oct 10.

A Stochastic Landscape Approach for Protein Folding State Classification.

J Chem Theory Comput. 2024 Jul 9;20(13):5428-5438. doi: 10.1021/acs.jctc.4c00464. Epub 2024 Jun 26.

Deciphering the Coevolutionary Dynamics of L2 β-Lactamases via Deep Learning.

J Chem Inf Model. 2024 May 13;64(9):3706-3717. doi: 10.1021/acs.jcim.4c00189. Epub 2024 Apr 30.

AI-driven multiscale simulations illuminate mechanisms of SARS-CoV-2 spike dynamics.

Int J High Perform Comput Appl. 2021 Sep;35(5):432-451. doi: 10.1177/10943420211006452.

Intelligent resolution: Integrating Cryo-EM with AI-driven multi-resolution simulations to observe the severe acute respiratory syndrome coronavirus-2 replication-transcription machinery in action.

Int J High Perform Comput Appl. 2022 Nov;36(5-6):603-623. doi: 10.1177/10943420221113513. Epub 2022 Aug 5.

Deep learning workflow for the inverse design of molecules with specific optoelectronic properties.

Sci Rep. 2023 Nov 16;13(1):20031. doi: 10.1038/s41598-023-45385-9.

本文引用的文献

Variational encoding of complex dynamics.

Phys Rev E. 2018 Jun;97(6-1):062412. doi: 10.1103/PhysRevE.97.062412.

Enhanced Dynamics of Hydrated tRNA on Nanodiamond Surfaces: A Combined Neutron Scattering and MD Simulation Study.

J Phys Chem B. 2016 Sep 29;120(38):10059-10068. doi: 10.1021/acs.jpcb.6b07511. Epub 2016 Sep 14.

Deep Learning in Drug Discovery.

Mol Inform. 2016 Jan;35(1):3-14. doi: 10.1002/minf.201501008. Epub 2015 Dec 30.

Computational 'microscopy' of cellular membranes.

J Cell Sci. 2016 Jan 15;129(2):257-68. doi: 10.1242/jcs.176040. Epub 2016 Jan 7.

On-the-Fly Identification of Conformational Substates from Molecular Dynamics Simulations.

J Chem Theory Comput. 2011 Mar 8;7(3):778-89. doi: 10.1021/ct100531j. Epub 2011 Feb 10.

Systematic characterization of protein folding pathways using diffusion maps: application to Trp-cage miniprotein.

J Chem Phys. 2015 Feb 28;142(8):085101. doi: 10.1063/1.4913322.

Quantifying the Sources of Kinetic Frustration in Folding Simulations of Small Proteins.

J Chem Theory Comput. 2014 Aug 12;10(8):2964-2974. doi: 10.1021/ct500361w. Epub 2014 Jun 13.

Protein conformational populations and functionally relevant substates.

Acc Chem Res. 2014 Jan 21;47(1):149-56. doi: 10.1021/ar400084s. Epub 2013 Aug 29.

Evaluation of Dimensionality-reduction Methods from Peptide Folding-unfolding Simulations.

J Chem Theory Comput. 2013 May 14;9(5):2490-2497. doi: 10.1021/ct400052y.

Event detection and sub-state discovery from biomolecular simulations using higher-order statistics: application to enzyme adenylate kinase.

Proteins. 2012 Nov;80(11):2536-51. doi: 10.1002/prot.24135. Epub 2012 Aug 8.

文献AI研究员

20分钟写一篇综述，助力文献阅读效率提升50倍。

立即体验

用中文搜PubMed

大模型驱动的PubMed中文搜索引擎

马上搜索

文档翻译

学术文献翻译模型，支持多种主流文档格式。

立即体验

蛋白质折叠模拟的深度聚类。

Deep clustering of protein folding simulations.

机构信息

出版信息

BACKGROUND

RESULTS

CONCLUSIONS

背景

结果

结论

相似文献

引用本文的文献

本文引用的文献

文献AI研究员

用中文搜PubMed

文档翻译

Suppr 超能文献

相似文献

引用本文的文献

本文引用的文献