Suppr超能文献

ProtWave-VAE:用于数据驱动蛋白质设计的基于潜在信息的推断与自回归采样的整合。

ProtWave-VAE: Integrating Autoregressive Sampling with Latent-Based Inference for Data-Driven Protein Design.

机构信息

Graduate Program in Biophysical Sciences, University of Chicago, Chicago, Illinois 60637, United States.

Department of Chemistry, University of Chicago, Chicago, Illinois 60637, United States.

出版信息

ACS Synth Biol. 2023 Dec 15;12(12):3544-3561. doi: 10.1021/acssynbio.3c00261. Epub 2023 Nov 21.

Abstract

Deep generative models (DGMs) have shown great success in the understanding and data-driven design of proteins. Variational autoencoders (VAEs) are a popular DGM approach that can learn the correlated patterns of amino acid mutations within a multiple sequence alignment (MSA) of protein sequences and distill this information into a low-dimensional latent space to expose phylogenetic and functional relationships and guide generative protein design. Autoregressive (AR) models are another popular DGM approach that typically lacks a low-dimensional latent embedding but does not require training sequences to be aligned into an MSA and enable the design of variable length proteins. In this work, we propose ProtWave-VAE as a novel and lightweight DGM, employing an information maximizing VAE with a dilated convolution encoder and an autoregressive WaveNet decoder. This architecture blends the strengths of the VAE and AR paradigms in enabling training over unaligned sequence data and the conditional generative design of variable length sequences from an interpretable, low-dimensional learned latent space. We evaluated the model's ability to infer patterns and design rules within alignment-free homologous protein family sequences and to design novel synthetic proteins in four diverse protein families. We show that our model can infer meaningful functional and phylogenetic embeddings within latent spaces and make highly accurate predictions within semisupervised downstream fitness prediction tasks. In an application to the C-terminal SH3 domain in the Sho1 transmembrane osmosensing receptor in baker's yeast, we subject ProtWave-VAE-designed sequences to experimental gene synthesis and select-seq assays for the osmosensing function to show that the model enables synthetic protein design, conditional C-terminus diversification, and engineering of the osmosensing function into SH3 paralogues.

摘要

深度生成模型(DGM)在理解和数据驱动的蛋白质设计方面取得了巨大成功。变分自编码器(VAE)是一种流行的 DGM 方法,它可以学习蛋白质序列的多序列比对(MSA)中氨基酸突变的相关模式,并将这些信息提炼到低维潜在空间中,以揭示系统发育和功能关系,并指导生成蛋白质设计。自回归(AR)模型是另一种流行的 DGM 方法,通常缺乏低维潜在嵌入,但不需要将训练序列对齐到 MSA 中,并能够设计可变长度的蛋白质。在这项工作中,我们提出了 ProtWave-VAE,这是一种新颖的轻量级 DGM,采用具有扩张卷积编码器和自回归 WaveNet 解码器的信息最大化 VAE。这种架构融合了 VAE 和 AR 范式的优势,能够在不对齐的序列数据上进行训练,并从可解释的低维学习潜在空间中对可变长度序列进行条件生成设计。我们评估了该模型在推断无对齐同源蛋白质家族序列中的模式和设计规则以及在四个不同蛋白质家族中设计新的合成蛋白质的能力。我们表明,我们的模型可以在潜在空间中推断出有意义的功能和系统发育嵌入,并在半监督的下游适应性预测任务中进行高度准确的预测。在对酿酒酵母 Sho1 跨膜渗透压感受器中的 C 端 SH3 结构域的应用中,我们对 ProtWave-VAE 设计的序列进行了实验基因合成和选择-seq 测定,以评估渗透压功能,结果表明该模型能够进行合成蛋白质设计、条件 C 端多样化以及对 SH3 同源物的渗透压功能进行工程改造。

相似文献

3
Generating functional protein variants with variational autoencoders.利用变分自动编码器生成功能性蛋白质变体。
PLoS Comput Biol. 2021 Feb 26;17(2):e1008736. doi: 10.1371/journal.pcbi.1008736. eCollection 2021 Feb.
6
7
Deep Mixture Generative Autoencoders.深度混合生成自编码器
IEEE Trans Neural Netw Learn Syst. 2022 Oct;33(10):5789-5803. doi: 10.1109/TNNLS.2021.3071401. Epub 2022 Oct 5.

引用本文的文献

1
Opportunities and Challenges for Machine Learning-Assisted Enzyme Engineering.机器学习辅助酶工程面临的机遇与挑战
ACS Cent Sci. 2024 Feb 5;10(2):226-241. doi: 10.1021/acscentsci.3c01275. eCollection 2024 Feb 28.

本文引用的文献

6
ColabFold: making protein folding accessible to all.ColabFold:让蛋白质折叠变得人人可用。
Nat Methods. 2022 Jun;19(6):679-682. doi: 10.1038/s41592-022-01488-1. Epub 2022 May 30.
8
Machine learning to navigate fitness landscapes for protein engineering.机器学习在蛋白质工程中的应用:探索适应度景观
Curr Opin Biotechnol. 2022 Jun;75:102713. doi: 10.1016/j.copbio.2022.102713. Epub 2022 Apr 9.
9
Protein design via deep learning.通过深度学习进行蛋白质设计。
Brief Bioinform. 2022 May 13;23(3). doi: 10.1093/bib/bbac102.

文献检索

告别复杂PubMed语法,用中文像聊天一样搜索,搜遍4000万医学文献。AI智能推荐,让科研检索更轻松。

立即免费搜索

文件翻译

保留排版,准确专业,支持PDF/Word/PPT等文件格式,支持 12+语言互译。

免费翻译文档

深度研究

AI帮你快速写综述,25分钟生成高质量综述,智能提取关键信息,辅助科研写作。

立即免费体验