潜在生成景观作为蛋白质序列空间中功能多样性的图谱。

Latent generative landscapes as maps of functional diversity in protein sequence space.

机构信息

Department of Biological Sciences, University of Texas at Dallas, Richardson, TX, 75080, USA.

Department of Bioengineering, University of Texas at Dallas, Richardson, TX, 75080, USA.

出版信息

Nat Commun. 2023 Apr 19;14(1):2222. doi: 10.1038/s41467-023-37958-z.

DOI:10.1038/s41467-023-37958-z

PMID:37076519

原文链接:https://pmc.ncbi.nlm.nih.gov/articles/PMC10113739/

Abstract

Variational autoencoders are unsupervised learning models with generative capabilities, when applied to protein data, they classify sequences by phylogeny and generate de novo sequences which preserve statistical properties of protein composition. While previous studies focus on clustering and generative features, here, we evaluate the underlying latent manifold in which sequence information is embedded. To investigate properties of the latent manifold, we utilize direct coupling analysis and a Potts Hamiltonian model to construct a latent generative landscape. We showcase how this landscape captures phylogenetic groupings, functional and fitness properties of several systems including Globins, β-lactamases, ion channels, and transcription factors. We provide support on how the landscape helps us understand the effects of sequence variability observed in experimental data and provides insights on directed and natural protein evolution. We propose that combining generative properties and functional predictive power of variational autoencoders and coevolutionary analysis could be beneficial in applications for protein engineering and design.

摘要

变分自动编码器是具有生成能力的无监督学习模型，当应用于蛋白质数据时，它们可以根据系统发育对序列进行分类，并生成保留蛋白质组成统计特性的从头序列。虽然以前的研究集中在聚类和生成特性上，但在这里，我们评估了序列信息所嵌入的潜在流形。为了研究潜在流形的特性，我们利用直接耦合分析和 Potts 哈密顿模型来构建潜在的生成景观。我们展示了这个景观如何捕捉几个系统的系统发育分组、功能和适应性特征，包括球蛋白、β-内酰胺酶、离子通道和转录因子。我们提供了关于景观如何帮助我们理解实验数据中观察到的序列可变性的影响，并提供了关于定向和自然蛋白质进化的见解的信息。我们提出，将变分自动编码器的生成特性和功能预测能力与共进化分析相结合，可能有助于蛋白质工程和设计的应用。

https://cdn.ncbi.nlm.nih.gov/pmc/blobs/1150/10115828/68c0812025fb/41467_2023_37958_Fig1_HTML.jpg

相似文献

Latent generative landscapes as maps of functional diversity in protein sequence space.潜在生成景观作为蛋白质序列空间中功能多样性的图谱。

Nat Commun. 2023 Apr 19;14(1):2222. doi: 10.1038/s41467-023-37958-z.

Deciphering protein evolution and fitness landscapes with latent space models.利用潜在空间模型破解蛋白质进化和适应度景观。

Nat Commun. 2019 Dec 10;10(1):5644. doi: 10.1038/s41467-019-13633-0.

Generative models for protein sequence modeling: recent advances and future directions.蛋白质序列建模的生成模型：最新进展和未来方向。

Brief Bioinform. 2023 Sep 22;24(6). doi: 10.1093/bib/bbad358.

ProtWave-VAE: Integrating Autoregressive Sampling with Latent-Based Inference for Data-Driven Protein Design.ProtWave-VAE：用于数据驱动蛋白质设计的基于潜在信息的推断与自回归采样的整合。

ACS Synth Biol. 2023 Dec 15;12(12):3544-3561. doi: 10.1021/acssynbio.3c00261. Epub 2023 Nov 21.

The generative capacity of probabilistic protein sequence models.概率蛋白质序列模型的生成能力。

Nat Commun. 2021 Nov 2;12(1):6302. doi: 10.1038/s41467-021-26529-9.

An Overview of Variational Autoencoders for Source Separation, Finance, and Bio-Signal Applications.用于源分离、金融和生物信号应用的变分自编码器概述。

Entropy (Basel). 2021 Dec 28;24(1):55. doi: 10.3390/e24010055.

Navigating the amino acid sequence space between functional proteins using a deep learning framework.使用深度学习框架探索功能蛋白之间的氨基酸序列空间。

PeerJ Comput Sci. 2021 Sep 17;7:e684. doi: 10.7717/peerj-cs.684. eCollection 2021.

Using deep LSD to build operators in GANs latent space with meaning in real space.使用深度 LSD 在 GANs 潜在空间中构建具有真实空间意义的算子。

PLoS One. 2023 Jun 29;18(6):e0287736. doi: 10.1371/journal.pone.0287736. eCollection 2023.

Clustering Analysis via Deep Generative Models With Mixture Models.基于混合模型的深度生成模型的聚类分析

IEEE Trans Neural Netw Learn Syst. 2022 Jan;33(1):340-350. doi: 10.1109/TNNLS.2020.3027761. Epub 2022 Jan 5.

Latent Space Bayesian Optimization With Latent Data Augmentation for Enhanced Exploration.用于增强探索的带潜在数据增强的潜在空间贝叶斯优化

Neural Comput. 2024 Oct 11;36(11):2446-2478. doi: 10.1162/neco_a_01708.

引用本文的文献

Integrating experimental feedback improves generative models for biological sequences.整合实验反馈可改进生物序列生成模型。

Nucleic Acids Res. 2025 Aug 27;53(16). doi: 10.1093/nar/gkaf832.

Generative Landscapes and Dynamics to Design Functional Multidomain Artificial Transmembrane Transporters.用于设计功能性多结构域人工跨膜转运蛋白的生成景观与动力学

ACS Cent Sci. 2025 Jul 10;11(8):1452-1466. doi: 10.1021/acscentsci.5c00708. eCollection 2025 Aug 27.

Considering Metabolic Context in Enzyme Evolution and Design.酶进化与设计中的代谢背景考量

Biochemistry. 2025 Aug 19;64(16):3495-3507. doi: 10.1021/acs.biochem.5c00165. Epub 2025 Aug 5.

Higher-order epistasis drives evolutionary unpredictability toward novel antibiotic resistance.高阶上位性驱动了对新型抗生素耐药性的进化不可预测性。

bioRxiv. 2025 Jul 11:2025.07.08.663783. doi: 10.1101/2025.07.08.663783.

Generative AI extracts ecological meaning from the complex three dimensional shapes of bird bills.生成式人工智能从鸟类喙部复杂的三维形状中提取生态意义。

PLoS Comput Biol. 2025 Mar 17;21(3):e1012887. doi: 10.1371/journal.pcbi.1012887. eCollection 2025 Mar.

Thermal Adaptation of Cytosolic Malate Dehydrogenase Revealed by Deep Learning and Coevolutionary Analysis.深度学习与协同进化分析揭示胞质苹果酸脱氢酶的热适应性

J Chem Theory Comput. 2025 Mar 25;21(6):3277-3287. doi: 10.1021/acs.jctc.4c01774. Epub 2025 Mar 13.

Engineering Dehalogenase Enzymes Using Variational Autoencoder-Generated Latent Spaces and Microfluidics.利用变分自编码器生成的潜在空间和微流体技术设计脱卤酶

JACS Au. 2025 Feb 13;5(2):838-850. doi: 10.1021/jacsau.4c01101. eCollection 2025 Feb 24.

Entrenchment and contingency in neutral protein evolution with epistasis.中性蛋白质进化中上位性的固定与偶然性

bioRxiv. 2025 Jan 14:2025.01.09.632266. doi: 10.1101/2025.01.09.632266.

ProteinReDiff: Complex-based ligand-binding proteins redesign by equivariant diffusion-based generative models.ProteinReDiff：基于等变扩散生成模型的基于复合物的配体结合蛋白重新设计

Struct Dyn. 2024 Nov 25;11(6):064102. doi: 10.1063/4.0000271. eCollection 2024 Nov.

How deep can we decipher protein evolution with deep learning models.利用深度学习模型，我们能在多大程度上解读蛋白质进化？

Patterns (N Y). 2024 Aug 9;5(8):101043. doi: 10.1016/j.patter.2024.101043.

本文引用的文献

Generative power of a protein language model trained on multiple sequence alignments.基于多序列比对训练的蛋白质语言模型的生成能力。

Elife. 2023 Feb 3;12:e79854. doi: 10.7554/eLife.79854.

Learning meaningful representations of protein sequences.学习蛋白质序列有意义的表示方法。

Nat Commun. 2022 Apr 8;13(1):1914. doi: 10.1038/s41467-022-29443-w.

Coupling a Live Cell Directed Evolution Assay with Coevolutionary Landscapes to Engineer an Improved Fluorescent Rhodopsin Chloride Sensor.将活细胞定向进化测定与共进化景观相结合，以工程化改良的荧光视紫红质氯离子传感器。

ACS Synth Biol. 2022 Apr 15;11(4):1627-1638. doi: 10.1021/acssynbio.2c00033. Epub 2022 Apr 7.

A-Prot: protein structure modeling using MSA transformer.A-Prot：使用多序列比对转换器进行蛋白质结构建模。

BMC Bioinformatics. 2022 Mar 16;23(1):93. doi: 10.1186/s12859-022-04628-8.

Epistatic models predict mutable sites in SARS-CoV-2 proteins and epitopes.上位模型预测了 SARS-CoV-2 蛋白和表位中的可变位点。

Proc Natl Acad Sci U S A. 2022 Jan 25;119(4). doi: 10.1073/pnas.2113118119.

ProteinBERT: a universal deep-learning model of protein sequence and function.蛋白质 BERT：一种通用的蛋白质序列和功能深度学习模型。

Bioinformatics. 2022 Apr 12;38(8):2102-2110. doi: 10.1093/bioinformatics/btac020.

Explore Protein Conformational Space With Variational Autoencoder.使用变分自编码器探索蛋白质构象空间。

Front Mol Biosci. 2021 Nov 12;8:781635. doi: 10.3389/fmolb.2021.781635. eCollection 2021.

Modeling Sequence-Space Exploration and Emergence of Epistatic Signals in Protein Evolution.蛋白质进化中序列空间探索和上位信号涌现的建模。

Mol Biol Evol. 2022 Jan 7;39(1). doi: 10.1093/molbev/msab321.

The generative capacity of probabilistic protein sequence models.概率蛋白质序列模型的生成能力。

Nat Commun. 2021 Nov 2;12(1):6302. doi: 10.1038/s41467-021-26529-9.

Disease variant prediction with deep generative models of evolutionary data.利用进化数据的深度生成模型进行疾病变异预测。

Nature. 2021 Nov;599(7883):91-95. doi: 10.1038/s41586-021-04043-8. Epub 2021 Oct 27.

文献检索

告别复杂PubMed语法，用中文像聊天一样搜索，搜遍4000万医学文献。AI智能推荐，让科研检索更轻松。

立即免费搜索

文件翻译

保留排版，准确专业，支持PDF/Word/PPT等文件格式，支持 12+语言互译。

免费翻译文档

深度研究

AI帮你快速写综述，25分钟生成高质量综述，智能提取关键信息，辅助科研写作。

立即免费体验

潜在生成景观作为蛋白质序列空间中功能多样性的图谱。

Latent generative landscapes as maps of functional diversity in protein sequence space.

机构信息

出版信息

相似文献

引用本文的文献

本文引用的文献

文献检索

文件翻译

深度研究

Suppr 超能文献

相似文献

引用本文的文献

本文引用的文献